ROCm vs Vulkan Runtime Benchmark¶
Last updated: 2026-05-31
Executive Decision¶
Keep the live Fedora local AI stack on the current Vulkan/RADV llama.cpp runtime.
The AMD ROCm prebuilt runtime for Strix/Strix Halo detects the Radeon 8060S and reports the
96 GiB GPU memory allocation, but it segfaults on Fedora when loading ROCm-compatible Qwen
models. It also segfaults with -ngl 0, so the failure is not only GPU layer offload. Treat the
AMD prebuilt as an Ubuntu-targeted package, not a reliable Fedora runtime.
Do not switch production LiteLLM aliases to AMD ROCm from this test. The next ROCm path is either:
- build the current llama.cpp revision with HIP/ROCm on Fedora, or
- test the AMD package on a vendor-aligned Ubuntu 24.04 boot/install profile.
Why This Benchmark Was Needed¶
The live production models are newer than the AMD ROCm prebuilt runtime:
| Production lane | Current model | GGUF architecture | AMD prebuilt support |
|---|---|---|---|
| Coder | Qwen3-Coder-Next-Q4_K_M |
qwen3next |
not supported by AMD b7146 |
| Vision | Qwen3.5-9B-Q4_K_M |
qwen35 |
not supported by AMD b7146 |
| Embeddings | Qwen3-Embedding-0.6B-Q8_0 |
qwen3 |
architecture recognized |
Because of that, this benchmark used the closest models that the AMD b7146 bundle should support:
| Lane | Compatible benchmark model | Architecture |
|---|---|---|
| Coder | lucataco/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M-GGUF |
qwen3moe |
| Vision | Qwen/Qwen3-VL-8B-Instruct-GGUF, Q4_K_M + mmproj-F16 |
qwen3vl |
| Embeddings | existing Qwen3-Embedding-0.6B-Q8_0 |
qwen3 |
This means the benchmark answers a narrow question:
Can the AMD ROCm prebuilt package run compatible Qwen-family models on this Fedora Strix Halo server better than the current Vulkan runtime?
It does not judge ROCm as a technology in general, and it does not judge a current-source HIP build.
Test Workflow¶
flowchart TD
A["Download ROCm-compatible benchmark models"] --> B["Build matching Vulkan benchmark helpers"]
B --> C["Stop resident model services only"]
C --> D["Run Vulkan/RADV llama.cpp benchmarks"]
D --> E["Run AMD ROCm/HIP prebuilt benchmarks"]
E --> F["Restart original model services"]
F --> G["Verify health checks"]
G --> H["Write JSON, Markdown summary, and wiki findings"]
Operational details:
- Host:
local-server-adeel - OS: Fedora Linux 44 Workstation
- GPU: AMD Radeon 8060S Graphics, Strix Halo /
gfx1151 - BIOS GPU memory allocation observed by ROCm:
98304 MiB - Live Vulkan build:
/srv/localai/llama.cpp/build/bin, llama.cppb9371 - AMD ROCm test bundle:
/srv/localai/rocm-test/llama-b7146-rocm/..., llama.cppb7146 - Benchmark model root:
/srv/localai/rocm-test/models - Result directory:
/srv/localai/rocm-test/benchmarks/20260531T172731Z - Benchmark harness:
/srv/localai/rocm-test/benchmark_rocm_vulkan_compatible.py
The benchmark stopped only:
localai-qwen-coder
localai-qwen-vision
localai-embed
Then it restarted them and confirmed all model and RAG services were healthy.
Result Table¶
| Lane | Vulkan/RADV result | AMD ROCm/HIP prebuilt result | Interpretation |
|---|---|---|---|
Coder text, qwen3moe 30B A3B Q4 |
prompt eval 1299.44 t/s; generation 95.84 t/s |
failed, SIGSEGV / rc -11 |
Vulkan is usable; AMD prebuilt is not stable on Fedora |
Vision text-only, qwen3vl 8B Q4 |
prompt eval 1315.63 t/s; generation 42.76 t/s |
failed, SIGSEGV / rc -11 |
Vulkan is usable; AMD prebuilt crashes |
Embedding bench, qwen3 0.6B Q8 |
prompt eval 8965.55 t/s |
failed, SIGSEGV / rc -11 |
Even a supported small Qwen3 model crashes |
Full vision image, qwen3vl 8B Q4 + mmproj |
completed in 4.845 s |
failed, SIGSEGV / rc -11 |
Full multimodal ROCm path is unusable via this prebuilt |
Additional CPU-only probe:
- AMD ROCm bundle with
qwen3vl,-ngl 0: recognizedgeneral.architecture = qwen3vl, then segfaulted. - AMD ROCm bundle with
qwen3embedding,-ngl 0: recognizedgeneral.architecture = qwen3, then segfaulted.
That matters because -ngl 0 offloads zero model layers. The package still initializes the ROCm
runtime and then crashes. This points to the AMD prebuilt runtime/platform combination, not simply
to model architecture support or GPU memory size.
What This Means¶
The current Vulkan/RADV runtime is the only proven path on Fedora right now.
The AMD package is not useless as evidence: it proves ROCm can enumerate the Strix Halo GPU and see the 96 GiB allocation. But it is not a deployable runtime on this Fedora install because it crashes on models it should understand.
For RapidDraft Agent and BOM/vision work:
- keep
local/qwen-coder,local/qwen-vision-fast, andlocal/embed-engineeringon the current Vulkan services - keep vision advisory and artifact-based, not a source of BOM truth
- do not downgrade production models just to fit AMD
b7146 - only revisit ROCm after a current llama.cpp HIP build or Ubuntu 24.04 ROCm test passes the same benchmark set
Repeatable Commands¶
Check AMD bundle device visibility:
ROCM=/srv/localai/rocm-test/llama-b7146-rocm/llama-b7146-ubuntu-24.04-rocm-7.1.1-gfx1150-gfx1151-x64
LD_LIBRARY_PATH="$ROCM:$ROCM/lib:$ROCM/rocblas/lib:$ROCM/hip/lib:$ROCM/rocm/lib" \
"$ROCM/llama-cli" --list-devices
Run the comparison harness:
python3 /srv/localai/rocm-test/benchmark_rocm_vulkan_compatible.py \
--manage-services \
--timeout 2400
Verify the live stack after benchmarking:
systemctl is-active localai-qwen-coder localai-qwen-vision localai-embed localai-litellm localai-rag-api postgresql smb
curl -fsS http://127.0.0.1:8010/health
curl -fsS http://127.0.0.1:8011/health
curl -fsS http://127.0.0.1:8012/health
curl -fsS http://127.0.0.1:4100/health
Raw Artifacts¶
- Summary:
/srv/localai/rocm-test/benchmarks/20260531T172731Z/summary.md - JSON:
/srv/localai/rocm-test/benchmarks/20260531T172731Z/benchmark.json - Benchmark image:
/srv/localai/rocm-test/benchmarks/20260531T172731Z/cad_bracket_benchmark.png - CPU-only ROCm probe log:
/srv/localai/rocm-test/logs/rocm-cpu-only-probe-2026-05-31T19-29-20+02-00.log