LLaVA是一种多模态大模型，其可以将输入的视觉特征提取成本文信息，jetson ai lab提供了相关教程，本文实践一下

Jetson-Orin-Nano-Super(10) Live LLaVA 实战

之前几个文章把几个热门的模型都在orin上跑起来试用了一下，分别是 qwen3，diffusion，yolo，这是嵌入式项目中很常见实际应用，现在再基于jetson ai lab的tutorial，实战一下rag，关于rag我这里直接使用jetson开源的rag来实战，而不是通用的llamaindex。当然jetson ai lab的教程也有llamaindex的示例。
# 安装
默认copilot需要chromium和docker安装，这个之前的文章就已经安装好了，这里直接运行即可
```
# ./launch_jetson_copilot.sh
```
启动日志如下
```
Status: Downloaded newer image for dustynv/jetson-copilot:r36.4.0

Starting ollama server

Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGy9+RxSNWAVS6zF32ZnHVF/LSr0I+pCnwmhtbZx6a2y

2025/09/11 09:09:42 routes.go:1158: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/data/models/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-09-11T09:09:42.176Z level=INFO source=images.go:754 msg="total blobs: 0"
time=2025-09-11T09:09:42.176Z level=INFO source=images.go:761 msg="total unused blobs removed: 0"
time=2025-09-11T09:09:42.176Z level=INFO source=routes.go:1205 msg="Listening on [::]:11434 (version 0.0.0)"
time=2025-09-11T09:09:42.177Z level=INFO source=common.go:135 msg="extracting embedded files" dir=/tmp/ollama1274236136/runners
time=2025-09-11T09:09:43.569Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu cuda_v12]"
time=2025-09-11T09:09:43.570Z level=INFO source=gpu.go:199 msg="looking for compatible GPUs"
time=2025-09-11T09:09:43.579Z level=WARN source=gpu.go:669 msg="unable to locate gpu dependency libraries"
time=2025-09-11T09:09:43.579Z level=WARN source=gpu.go:669 msg="unable to locate gpu dependency libraries"
time=2025-09-11T09:09:43.579Z level=WARN source=gpu.go:669 msg="unable to locate gpu dependency libraries"
time=2025-09-11T09:09:43.699Z level=INFO source=types.go:107 msg="inference compute" id=GPU-d03db7ac-e66d-56fe-ae22-93568383bf53 library=cuda variant=jetpack6 compute=8.7 driver=12.6 name=Orin total="7.4 GiB" available="5.6 GiB"

OLLAMA_MODELS /data/models/ollama/models
OLLAMA_LOGS   /data/logs/ollama.log

ollama server is now started, and you can run commands here like 'ollama run llama3'


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.


  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.31.55:8501
  External URL: http://hide:8501
```
登录这个链接看到如下

![image.png](/static/img/ded64aba1a3f5ed582b2b292798b054a.image.webp)

可以看到正在下载llama3模型，其实之前的文章我们本地已经下载ollama并安装了qwen3了，但是这里因为是容器，其实llm相关的动作又重新做了一遍。

因为是跟着tutorial在实战，尽量还是不要自己去发挥。等一会儿就好，完成之后，显示如下

![image.png](/static/img/e50cf42ce9b1ba049d4f08d81ad6117d.image.webp)

完成之后，回到主页面可以开始提问

![image.png](/static/img/81824004b90acc38a31a6cb77c31c827.image.webp)

# 演示
为了实践RAG，我以最近在阅读的Linux memory manager的pdf作为rag的文库，导入到RAG中，然后不使用rag和使用rag来对比，如下

点击`Build a new index`，这里起名是LMM

然后进入docker容器内部，准备上传文档
```
# docker exec -it kind_blackburn /bin/bash
# mkdir /opt/jetson_copilot/Documents/LMM
# scp root@kylin:~/_EA_Linux_Memory_Manager_21025.pdf .
```
然后在网页的`Local documents`选中LMM目录，可以看到需要新增一个文档，点击`Build index`

![image.png](/static/img/8830072f7a8f38481a406fb95e5873a8.image.webp)

等待其生成index完成之后，回到主页面可以开始使用了。这里以linux内核的内存规整作为提问来测试rag的作用。

下面是不使用rag的问答

![image.png](/static/img/0d5a60a7bea60bb2531a8a16d48773ce.image.webp)

可以看到，这个回答没有技术细节。接下来使用带上Linux Memory Manager书籍的RAG问答

![image.png](/static/img/eb9f1b4d7d6be51d973bb503665203eb.image.webp)

可以看到，这次回答引用了LMM书上的内容，并提供了部分代码细节

# 总结
本文基于Copilot进行了实战，通过容器运行了llama3大语言模型，然后加载LMM书籍作为RAG的素材，生成index后，对比了使用rag和不使用rag的效果。可以发现rag作为团队或个人的内部知识库还是很不错的。

Jetson-Orin-Nano-Super(9) RAG Jetson Copilot 实战

根据jetson ai lab的了解，这里再实战一下yolo的部署，jetson ai lab提供了具体方法，本文主要是实践
# ultralytics on jetson
根据官网的指引，其安装方法也是通过容器，非常简单，如下即可
```
t=ultralytics/ultralytics:latest-jetson-jetpack6
sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
```
待运行完成之后，默认已经在容器的bash中了，可以直接测试验证，这边以CLI作为例子实践如下。  
导出tensorrt模型
```
# yolo export model=yolov8n.pt format=engine
```
此时运行日志如下
```

Creating new Ultralytics Settings v0.0.6 file ✅
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.
Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8n.pt to 'yolov8n.pt': 100% ━━━━━━━━━━━━ 6.2MB 15.4MB/s 0.4s
WARNING ⚠️ TensorRT requires GPU export, automatically assigning device=0
Ultralytics 8.3.198 🚀 Python-3.10.12 torch-2.5.0a0+872d972e41.nv24.08 CUDA:0 (Orin, 7620MiB)
YOLOv8n summary (fused): 72 layers, 3,151,904 parameters, 0 gradients, 8.7 GFLOPs

PyTorch: starting from 'yolov8n.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (6.2 MB)
requirements: Ultralytics requirement ['onnxslim>=0.1.67'] not found, attempting AutoUpdate...
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torch 2.5.0a0+872d972e41.nv24.8 requires sympy==1.13.1; python_version >= "3.9", but you have sympy 1.14.0 which is incompatible.
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
Collecting onnxslim>=0.1.67
  Downloading onnxslim-0.1.68-py3-none-any.whl.metadata (7.6 kB)
Requirement already satisfied: onnx in /usr/local/lib/python3.10/dist-packages (from onnxslim>=0.1.67) (1.19.0)
Collecting sympy>=1.13.3 (from onnxslim>=0.1.67)
  Downloading sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from onnxslim>=0.1.67) (20.9)
Collecting colorama (from onnxslim>=0.1.67)
  Downloading colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Requirement already satisfied: ml_dtypes in /usr/local/lib/python3.10/dist-packages (from onnxslim>=0.1.67) (0.5.3)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy>=1.13.3->onnxslim>=0.1.67) (1.3.0)
Requirement already satisfied: numpy>=1.21 in /usr/local/lib/python3.10/dist-packages (from ml_dtypes->onnxslim>=0.1.67) (1.26.4)
Requirement already satisfied: protobuf>=4.25.1 in /usr/local/lib/python3.10/dist-packages (from onnx->onnxslim>=0.1.67) (5.29.5)
Requirement already satisfied: typing_extensions>=4.7.1 in /usr/local/lib/python3.10/dist-packages (from onnx->onnxslim>=0.1.67) (4.15.0)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from packaging->onnxslim>=0.1.67) (3.2.3)
Downloading onnxslim-0.1.68-py3-none-any.whl (164 kB)
Downloading sympy-1.14.0-py3-none-any.whl (6.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.3/6.3 MB 1.9 MB/s  0:00:03
Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Installing collected packages: sympy, colorama, onnxslim
  Attempting uninstall: sympy
    Found existing installation: sympy 1.13.1
    Uninstalling sympy-1.13.1:
      Successfully uninstalled sympy-1.13.1

Successfully installed colorama-0.4.6 onnxslim-0.1.68 sympy-1.14.0

requirements: AutoUpdate success ✅ 16.1s
WARNING ⚠️ requirements: Restart runtime or rerun command for updates to take effect


ONNX: starting export with onnx 1.19.0 opset 19...
ONNX: slimming with onnxslim 0.1.68...
ONNX: export success ✅ 18.9s, saved as 'yolov8n.onnx' (12.2 MB)

TensorRT: starting export with TensorRT 10.3.0...
[09/11/2025-02:15:50] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 627, GPU 2981 (MiB)
[09/11/2025-02:15:52] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +927, GPU +1196, now: CPU 1597, GPU 4209 (MiB)
[09/11/2025-02:15:52] [TRT] [I] ----------------------------------------------------------------
[09/11/2025-02:15:52] [TRT] [I] Input filename:   yolov8n.onnx
[09/11/2025-02:15:52] [TRT] [I] ONNX IR version:  0.0.9
[09/11/2025-02:15:52] [TRT] [I] Opset version:    19
[09/11/2025-02:15:52] [TRT] [I] Producer name:    pytorch
[09/11/2025-02:15:52] [TRT] [I] Producer version: 2.5.0
[09/11/2025-02:15:52] [TRT] [I] Domain:
[09/11/2025-02:15:52] [TRT] [I] Model version:    0
[09/11/2025-02:15:52] [TRT] [I] Doc string:
[09/11/2025-02:15:52] [TRT] [I] ----------------------------------------------------------------
TensorRT: input "images" with shape(1, 3, 640, 640) DataType.FLOAT
TensorRT: output "output0" with shape(1, 84, 8400) DataType.FLOAT
TensorRT: building FP32 engine as yolov8n.engine
[09/11/2025-02:15:52] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.

[09/11/2025-02:18:36] [TRT] [I] Detected 1 inputs and 3 output network tensors.
[09/11/2025-02:18:37] [TRT] [I] Total Host Persistent Memory: 389248
[09/11/2025-02:18:37] [TRT] [I] Total Device Persistent Memory: 0
[09/11/2025-02:18:37] [TRT] [I] Total Scratch Memory: 0
[09/11/2025-02:18:37] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 164 steps to complete.
[09/11/2025-02:18:37] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 12.5611ms to assign 7 blocks to 164 nodes requiring 18841600 bytes.
[09/11/2025-02:18:37] [TRT] [I] Total Activation Memory: 18841600
[09/11/2025-02:18:37] [TRT] [I] Total Weights Memory: 12727300
[09/11/2025-02:18:37] [TRT] [I] Engine generation completed in 164.86 seconds.
[09/11/2025-02:18:37] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 1 MiB, GPU 260 MiB
[09/11/2025-02:18:37] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 2400 MiB
TensorRT: export success ✅ 187.0s, saved as 'yolov8n.engine' (13.8 MB)

Export complete (188.4s)
Results saved to /ultralytics
Predict:         yolo predict task=detect model=yolov8n.engine imgsz=640
Validate:        yolo val task=detect model=yolov8n.engine imgsz=640 data=coco.yaml
Visualize:       https://netron.app
💡 Learn more at https://docs.ultralytics.com/modes/export
```
这里在做预测时，先看一下待预测的图片内容，如下

![image.png](/static/img/a3ddd09bfb8a927061762fe8872aedcb.image.webp)

接下来使用这个tensorrt的模型预测，如下
```
# yolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg'
```
运行的日志信息如下
```
WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify','pose' or 'obb'.
Ultralytics 8.3.198 🚀 Python-3.10.12 torch-2.5.0a0+872d972e41.nv24.08 CUDA:0 (Orin, 7620MiB)
Loading yolov8n.engine for TensorRT inference...
[09/11/2025-02:21:01] [TRT] [I] Loaded engine size: 13 MiB
[09/11/2025-02:21:01] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +18, now: CPU 0, GPU 30 (MiB)

Downloading https://ultralytics.com/images/bus.jpg to 'bus.jpg': 100% ━━━━━━━━━━━━ 134.2KB 1.2MB/s 0.1s
image 1/1 /ultralytics/bus.jpg: 640x640 4 persons, 1 bus, 7.7ms
Speed: 26.9ms preprocess, 7.7ms inference, 130.4ms postprocess per image at shape (1, 3, 640, 640)
Results saved to /ultralytics/runs/detect/predict
💡 Learn more at https://docs.ultralytics.com/modes/predict

```

可以看到其目标识别的预测值还是非常准确的

![bus.jpg](/static/img/37291e18e65142606ef18dce08e9d78c.bus.webp)

# ultralytics-yolo webtest
jetson ai lab还提供了webtest，命令也非常简单，首先安装examples包，如下
```
# venv
# pip install jetson-examples
```
然后直接运行即可
```
# reComputer run ultralytics-yolo
INFO: machine[nvidia jetson orin nano engineering reference developer kit super] confirmed...
run example：ultralytics-yolo
----example init----
CONFIG_FILE_PATH=/root/miniconda3/envs/virtual/lib/python3.10/site-packages/reComputer/scripts/ultralytics-yolo/config.yaml
yq is already installed.
jq is already installed.
jq-1.6
32.6.1 35.3.1 35.4.1 35.5.0 36.3.0 36.4.0
L4T VERSION 36.4.4 is not in the allowed versions list.
The JetPack versions currently supported by this container are: 32.6.1 35.3.1 35.4.1 35.5.0 36.3.0 36.4.0.
For more information : https://github.com/Seeed-Projects/jetson-examples
An error occurred. Exiting...
```
可以看到，examples的仓库没有及时更新，所以不支持36.4.4.这里需要对比一下版本差异。

对于r36.4.0的版本内容，可以如下查阅
> https://developer.nvidia.cn/embedded/jetson-linux-r3640

文档是
> https://docs.nvidia.com/jetson/archives/r36.4/ReleaseNotes/Jetson_Linux_Release_Notes_r36.4.pdf

而对于r36.6.4的版本本容，可以如下查阅
> https://developer.nvidia.cn/embedded/jetson-linux-r3644

文档是
> https://docs.nvidia.com/jetson/archives/r36.4.4/ReleaseNotes/Jetson_Linux_Release_Notes_r36.4.4.pdf

经过确认，相关tensorrt和yolo变化不大，应该可以复用。下面尝试修改jetson_examples的仓库

进入python安装路径，尝试定位到这个判断
```
/root/miniconda3/envs/virtual/lib/python3.10/site-packages/reComputer/scripts/ultralytics-yolo/run.sh
```
可以发现其基于config.yaml配置，查看config.yaml，如下
```
# The tested JetPack versions.
ALLOWED_L4T_VERSIONS:
  - 32.6.1
  - 35.3.1
  - 35.4.1
  - 35.5.0
  - 36.3.0
  - 36.4.0
```
这里新增一条，如下
```
# The tested JetPack versions.
ALLOWED_L4T_VERSIONS:
  - 32.6.1
  - 35.3.1
  - 35.4.1
  - 35.5.0
  - 36.3.0
  - 36.4.0
  - 36.4.4
```
同时，run.sh也做了二次校验判断，如下
```
# Determine the Docker image based on L4T version
if [[ "$L4T_VERSION" == "32.6.1" ]]; then
    IMAGE_NAME="yaohui1998/ultralytics-jetpack4:1.0"
elif [[ "$L4T_VERSION" == "35.3.1" || "$L4T_VERSION" == "35.4.1" || "$L4T_VERSION" == "35.5.0" ]]; then
    IMAGE_NAME="yaohui1998/ultralytics-jetpack5:1.0"
elif [[ "$L4T_VERSION" == "36.3.0" ]]; then
    IMAGE_NAME="yaohui1998/ultralytics-jetpack6:1.0"
elif [[ "$L4T_VERSION" == "36.4.0" ]]; then
    IMAGE_NAME="yaohui1998/ultralytics-jetpack61:v1.0"
else
    echo "Error: L4T version $L4T_VERSION is not supported."
    exit 1
fi
```
新增36.4.4即可，如下
```
elif [[ "$L4T_VERSION" == "36.4.4" ]]; then
    IMAGE_NAME="yaohui1998/ultralytics-jetpack61:v1.0"
else
```
这里相当于尝试使用jetpack6.1来在我当前的硬件和内核环境跑yolo

![image.png](/static/img/ff688a08dddc0423fc986ef75744e05d.image.webp)

因为yolo主要还是现成的模型来转换，tensorrt版本不变化太大其实没关系。但是是否能跑，现在还打个问号？接下来继续实践

再根据run.sh的docker运行指令，可以知道其启动方式，如下
```
if [ $(docker ps -a -q -f name=^/${CONTAINER_NAME}$) ]; then
    echo "Container $CONTAINER_NAME already exists. Starting and attaching..."
    echo "Please open http://127.0.0.1:5000 to access the WebUI."
    docker start $CONTAINER_NAME
    docker exec -it $CONTAINER_NAME /bin/bash
else
    echo "Container $CONTAINER_NAME does not exist. Creating and starting..."
    docker run -it \
        --name $CONTAINER_NAME \
        --privileged \
        --network host \
        -v ~/yolo_models/:/usr/src/ultralytics/models/ \
        -v /tmp/.X11-unix:/tmp/.X11-unix \
        -v /dev/*:/dev/* \
        -v /etc/localtime:/etc/localtime:ro \
        --runtime nvidia \
        $IMAGE_NAME
fi
```
可以看到，第一遍会启动一个名字为ultralytics-yolo的容器，后面每次都是重新打开此容器。默认情况下启动了5000端口作为webui。
```
docker exec -it ultralytics-yolo /bin/bash
```
然后进入5000端口，样式如下

![image.png](/static/img/27e7d456eb8802b50752d4895e681c7a.image.webp)

# yolo on webcam
以webcam做个实验，测试yolov8和yolov10的效果

首先插入双目摄像头，让容器重新加载/dev目录，如下
```
# docker restart ultralytics-yolo
# docker exec -it ultralytics-yolo /bin/bash
# ls /dev/video*
/dev/video0  /dev/video1  /dev/video2  /dev/video3
```

然后识别键盘如下

![image.png](/static/img/1461c0bcb39b84c39ce3fc29bf672dab.image.webp)

推理日志如下
```
# docker logs ultralytics-yolo -f
0: 640x640 (no detections), 7.1ms
Speed: 1.9ms preprocess, 7.1ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 (no detections), 7.1ms
Speed: 1.9ms preprocess, 7.1ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 (no detections), 7.1ms
Speed: 1.9ms preprocess, 7.1ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 (no detections), 7.2ms
Speed: 1.9ms preprocess, 7.2ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 (no detections), 7.3ms
Speed: 2.1ms preprocess, 7.3ms inference, 1.7ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 (no detections), 7.2ms
Speed: 2.0ms preprocess, 7.2ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 (no detections), 7.2ms
Speed: 1.9ms preprocess, 7.2ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)

0: 640x640 (no detections), 7.2ms
Speed: 2.0ms preprocess, 7.2ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 640)
```

可以看到orin运行yolo n/s 都比较流畅，不是很吃力

![image.png](/static/img/ca744a8936f26b5370b7d3a2ee6d981a.image.webp)
# 总结
本文基于ai lab的容器内容做了一个实践，简单的在orin上运行了yolo8和yolo11的多个模型。

Jetson-Orin-Nano-Super(8) yolo ultralytics 部署

上一个文章参考了网络的文章实战部署了qwen3 8b的llm模型，进一步的本文继续参考网络资源，上手在orin上部署stable diffusion
# jetson ai lab
nvidia有一个ai lab，可以游玩很多的ai相关内容
> https://www.jetson-ai-lab.com/

本文主要实战部署stable-diffusion，所以参考文档如下
> https://www.jetson-ai-lab.com/tutorial_stable-diffusion.html

可以看到，这里面会使用jetson containers的git仓库，先进去浏览一番

> https://github.com/dusty-nv/jetson-containers.git

可以看到这个仓库有很多的ai示例可以尝试

![image.png](/static/img/fd6e7625c634af9b812ece9593d268a0.image.webp)

这里找到stable diffusion的示例，其实际上是拉取了docker hub的容器镜像，如下
> https://hub.docker.com/r/dustynv/stable-diffusion-webui

可以看到当前最新的版本是r36.4.0

下面开始根据文档的步骤演示运行stable-diffusion-webui
# 实战
首先国内的docker需要设置一下proxy。
```
# mkdir -p /etc/systemd/system/docker.service.d
# cat /etc/systemd/system/docker.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=http://127.0.0.1:9981"
Environment="HTTPS_PROXY=http://127.0.0.1:9981"
# systemctl daemon-reload
# systemctl restart docker
```
这里不同的proxy换成不同端口即可。

然后按照文档，这里尝试安装最新的版本r36.4.0，如下
```
# jetson-containers run dustynv/stable-diffusion-webui:r36.4.0
```
但是发现此版本的python的numpy要求，venv无法提供，所以降级版本到r36.2.0，如下
```
# jetson-containers run dustynv/stable-diffusion-webui:r36.2.0
```
因为在国内，所以默认模型下载不了，这是安装上述配置挂上代理后下载的。  

下载完成之后，运行后发现docker的运行参数是如下
```
docker run --runtime nvidia --env NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics -it --rm --network host --shm-size=8g --volume /root/jetson-containers/jetson-containers/packages/diffusion/stable-diffusion-webui/openai/:/opt/stable-diffusion-webui/openai/ --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /root/jetson-containers/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd --device /dev/bus/usb --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 -v /run/jtop.sock:/run/jtop.sock --name jetson_container_20250910_222642 dustynv/stable-diffusion-webui:r36.2.0
```
第一个问题出现了，容器内的模型无法下载，这里观察到默认的data在/root/jetson-containers/jetson-containers/data上，如下
```
--volume /root/jetson-containers/jetson-containers/data:/data 
```
所以进入目录手动下载模型到指定目录，如下
```
# cd /root/jetson-containers/jetson-containers/data/models/stable-diffusion/models/Stable-diffusion
# wget https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors
```
下载后继续尝试运行，发现CLIPTokenizer无法加载，日志如下
```
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
Version: v1.7.0
Commit hash: cf2772fab0af5573da775e7437e6acdca424f26e
Launching Web UI with arguments: --data=/data/models/stable-diffusion --enable-insecure-extension-access --xformers --listen --port=7860
Style database not found: /data/models/stable-diffusion/styles.csv
Loading weights [6ce0161689] from /data/models/stable-diffusion/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors
/opt/stable-diffusion-webui/extensions-builtin/stable-diffusion-webui-tensorrt/ui_trt.py:64: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  with gr.Row().style(equal_height=False):
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 15.5s (prepare environment: 3.1s, import torch: 4.8s, import gradio: 1.9s, setup paths: 1.5s, initialize shared: 0.2s, other imports: 1.3s, setup codeformer: 0.1s, load scripts: 0.9s, create ui: 0.9s, gradio launch: 0.8s).
Creating model from config: /opt/stable-diffusion-webui/configs/v1-inference.yaml
/opt/stable-diffusion-webui/openai/clip-vit-large-patch14
creating model quickly: OSError
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/stable-diffusion-webui/modules/initialize.py", line 147, in load_model
    shared.sd_model  # noqa: B018
  File "/opt/stable-diffusion-webui/modules/shared_items.py", line 128, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "/opt/stable-diffusion-webui/modules/sd_models.py", line 531, in get_sd_model
    load_model()
  File "/opt/stable-diffusion-webui/modules/sd_models.py", line 634, in load_model
    sd_model = instantiate_from_config(sd_config.model)
  File "/opt/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/util.py", line 89, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "/opt/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 563, in __init__
    self.instantiate_cond_stage(cond_stage_config)
  File "/opt/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 630, in instantiate_cond_stage
    model = instantiate_from_config(config)
  File "/opt/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/util.py", line 89, in instantiate_from_config
    return get_obj_from_str(config["target"])(**config.get("params", dict()))
  File "/opt/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/encoders/modules.py", line 103, in __init__
    self.tokenizer = CLIPTokenizer.from_pretrained(version)
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1809, in from_pretrained
    raise EnvironmentError(
OSError: Can't load tokenizer for 'openai/clip-vit-large-patch14'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'openai/clip-vit-large-patch14' is the correct path to a directory containing all relevant files for a CLIPTokenizer tokenizer.
```
根据日志初步怀疑也是docker内的环境拿到openai/clip-vit-large-patch14，所以也是手动下载
```
# cd /root/jetson-containers/jetson-containers/packages/diffusion/stable-diffusion-webui
# mkdir openai && cd openai
# apt install git-lfs
# git clone https://huggingface.co/openai/clip-vit-large-patch14
```
经过漫长的等待，终于拉取了其13G的模型内容，如下
```
# cd /root/jetson-containers/jetson-containers/packages/diffusion/stable-diffusion-webui/openai/clip-vit-large-patch14    .
```
到这里还需要判断和分享一下日志报错的情况

这里根据报错的日志，看到launch.py的运行参数是如下
```
Launching Web UI with arguments: --data=/data/models/stable-diffusion --enable-insecure-extension-access --xformers --listen --port=7860
```
所以其实际运行的命令可能是如下
```
# python3 launch.py --data=/data/models/stable-diffusion --enable-insecure-extension-access --xformers --listen --port=7860
```
根据python的堆栈可以知道代码出错在`/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py`的1809行，进入确认一下，找到from_pretrained函数的声明，如下
```
def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], *init_inputs, **kwargs):
```
可以看到，这是`pretrained_model_name_or_path: Union[str, os.PathLike]`计算的路径，而pretrained_model_name_or_path可以从如下确认
```
  File "/opt/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/modules/encoders/modules.py", line 103, in __init__
    self.tokenizer = CLIPTokenizer.from_pretrained(version)
```
继续查看modules.py，确认version的值
```
def __init__(self, version="openai/clip-vit-large-patch14", device="cuda", max_length=77,
                 freeze=True, layer="last", layer_idx=None):  # clip-vit-base-patch32
```
合理猜测需要的路径实际上应该是容器里这个位置
```
/opt/stable-diffusion-webui/openai/clip-vit-large-patch14
```
为了保险，这里打上python的print再确认一遍，这里在from_pretrained打印pretrained_model_name_or_path的值，如下
```
print(os.path.abspath(pretrained_model_name_or_path))
```
手动运行，得到print打印，如下
```
# python3 launch.py --data=/data/models/stable-diffusion --enable-insecure-extension-access --xformers --listen --port=7860
/opt/stable-diffusion-webui/openai/clip-vit-large-patch14
```
猜测完全正确，需要在/opt/stable-diffusion-webui有对应的目录，现在返回去看docker run的参数，可以发现，/opt/stable-diffusion-webui并不是对外挂载的，所以需要新增一条volume参数，将之前下载的模型位置挂载到docker的这个目录上，如下
```
--volume /root/jetson-containers/jetson-containers/packages/diffusion/stable-diffusion-webui/openai/:/opt/stable-diffusion-webui/openai/
```
所以总的运行命令如下
```
docker run --runtime nvidia --env NVIDIA_DRIVER_CAPABILITIES=compute,utility,graphics -it --rm --network host --shm-size=8g --volume /root/jetson-containers/jetson-containers/packages/diffusion/stable-diffusion-webui/openai/:/opt/stable-diffusion-webui/openai/ --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /root/jetson-containers/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd --device /dev/bus/usb --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 -v /run/jtop.sock:/run/jtop.sock --name jetson_container_20250910_222642 dustynv/stable-diffusion-webui:r36.2.0 
```
此时运行日志如下
```
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]
Version: v1.7.0
Commit hash: cf2772fab0af5573da775e7437e6acdca424f26e
Launching Web UI with arguments: --data=/data/models/stable-diffusion --enable-insecure-extension-access --xformers --listen --port=7860
Style database not found: /data/models/stable-diffusion/styles.csv
Loading weights [6ce0161689] from /data/models/stable-diffusion/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors
/opt/stable-diffusion-webui/extensions-builtin/stable-diffusion-webui-tensorrt/ui_trt.py:64: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  with gr.Row().style(equal_height=False):
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 18.2s (prepare environment: 3.4s, import torch: 5.6s, import gradio: 2.2s, setup paths: 1.7s, initialize shared: 0.3s, other imports: 1.8s, setup codeformer: 0.2s, load scripts: 1.1s, create ui: 0.9s, gradio launch: 0.8s).
Creating model from config: /opt/stable-diffusion-webui/configs/v1-inference.yaml
Applying attention optimization: xformers... done.
Model loaded in 10.8s (load weights from disk: 2.2s, create model: 0.8s, apply weights to model: 6.5s, apply half(): 0.2s, load textual inversion embeddings: 0.6s, calculate empty prompt: 0.3s).
```
没有看到报错了，同时可以看到默认开放端口在0.0.0.0:7860，所以可以同一网络环境下打开这个页面，做一下尝试.

这里简单生成了一下麒麟嵌入式系统的图片，还是比较准确的，如下。

![image.png](/static/img/6e63df01f2e6c2c87dafe6f255bab089.image.webp)

再生成一张，查看一下gpu占用情况

![image.png](/static/img/d99afa1051d6c5964640a1ec90fbdbcb.image.webp)

![image.png](/static/img/b2259fd1390dda7cd8e199262828355b.image.webp)

可以看到，内存5个多G，GPU全跑，整体比qwen3 8b要顺利很多，orin nano运行stable diffusion 并不是很吃力。

最后，其实jetson-ai-lab的文档也告诉你了，容器是这样如下运行的，
```
cd /opt/stable-diffusion-webui && python3 launch.py \
  --data=/data/models/stable-diffusion \
  --enable-insecure-extension-access \
  --xformers \
  --listen \
  --port=7860
```

# 参考
> https://www.jetson-ai-lab.com/tutorial_stable-diffusion.html  
> https://zhuanlan.zhihu.com/p/1889443308834624504  

# 总结
本次参考网络资源实战了stable diffusion，相比于ollma，因为是容器运行，所以代理的问题比较难搞，但是总归来说有办法的，整体还是非常轻松能够跑起来的。使用两个小时下来，orin 发热不严重，完全满足可以长期运行。

Jetson-Orin-Nano-Super(7) stable-diffusion-webui 部署

这两天qwen推出了max模型,是qwen平台最强的大模型，短暂尝新了一下。

想到自己jetson也能够运行本地模型，遂基于jetson orin nano，运行了qwen3 8b在设备上。本文介绍qwen3 8b运行步骤
# qwen3 max
qwen3 max的网页版如下
> https://chat.qwen.ai/

简单问了一个问题：
![image.png](/static/img/7c38b0b7639d640bed30bc3954ec24d9.image.webp)

可以看到，这里使用了open-webui搭建的。
# qwen3 8b安装步骤
根据上面的启发，打算使用ollama搭建qwen3 8b，然后使用open-webui将其运行起来尝试，不知道能不能跑得动。

## 下载ollma
参阅官网说明`https://ollama.com/download/linux`，直接下载即可

![image.png](/static/img/a663320eb8bbfabc46c5ea2982df2165.image.webp)
```
curl -fsSL https://ollama.com/install.sh | sh
```
## 下载模型
然后进入qwen3选择对应的模型，这里我选择8b,所以如下使用
```
ollama run qwen3:8b
```
上面两步需要等很久，大概1小时。完成之后，可以ollma查看当前系统存在什么模型
```
root@kylin:~# ollama list
NAME        ID              SIZE      MODIFIED
qwen3:8b    500a1f067a9f    5.2 GB    2 hours ago
```
此时可以看到ollama默认启动端口在11434,如下
```
tcp6       0      0 localhost:11434              [::]:*                  LISTEN      12054/ollama
```
值得注意的是，这里是localhost，所以其他设备无法访问，所以需要开放一下，如下
```
# jetson_clocks
# OLLAMA_HOST=0.0.0.0 OLLAMA_MODELS=/usr/share/ollama/.ollama/models /usr/local/bin/ollama serve
```
修改后端口如下显示
```
tcp6       0      0 [::]:11434              [::]:*                  LISTEN      306996/ollama
```
curl验证一下
```
# curl 0.0.0.0:11434
Ollama is running
```
qwen3正常运行，下面稍微演示一下效果

![image.png](/static/img/157d34563328f15d3433f58b4b7385b6.image.webp)

## 安装open-webui
为了网页上使用qwen3模型，这里使用open-webui。

但是在安装之前，有个小坑。

我们jetson的环境默认python3是3.10，而open-webui是3.11的要求，所以直接venv的环境不满足要求，需要conda重新创建一个python3.11的环境，如下
```
# conda create -n openwebui python=3.11
```
创建完成之后，激活此环境,运行open-webui
```
# conda activate openwebui
# open-webui serve
```
完成之后，直接登录即可
```
http://ip:8080/
```
随便注册一个管理员账号。随便问一下如下


![image.png](/static/img/844d8bb74c163b3c0567a67c1a18865c.image.webp)

# jtop信息
可以看到，nano的GPU已经跑满了。8b对于orin来说还是很吃力的。
![image.png](/static/img/c864a29e60553dd4d9dce7934ec3373e.image.webp)

# 参考
> https://zhuanlan.zhihu.com/p/24364748492

# 总结
今天看新闻发布了qwen3 max，遂来兴趣实战了一下orin上的ollama，根据上面的办法，ollama的其他模型都可以运行，而orin的板子主要还是服务嵌入式评估，所以跑不了多大的模型，8b已经很吃力了。比8b小模型拿orin跑应该就没问题了。