ROCm vs Vulkan Runtime Benchmark¶

Last updated: 2026-05-31

Executive Decision¶

Keep the live Fedora local AI stack on the current Vulkan/RADV llama.cpp runtime.

The AMD ROCm prebuilt runtime for Strix/Strix Halo detects the Radeon 8060S and reports the 96 GiB GPU memory allocation, but it segfaults on Fedora when loading ROCm-compatible Qwen models. It also segfaults with -ngl 0, so the failure is not only GPU layer offload. Treat the AMD prebuilt as an Ubuntu-targeted package, not a reliable Fedora runtime.

Do not switch production LiteLLM aliases to AMD ROCm from this test. The next ROCm path is either:

build the current llama.cpp revision with HIP/ROCm on Fedora, or
test the AMD package on a vendor-aligned Ubuntu 24.04 boot/install profile.

Why This Benchmark Was Needed¶

The live production models are newer than the AMD ROCm prebuilt runtime:

Production lane	Current model	GGUF architecture	AMD prebuilt support
Coder	`Qwen3-Coder-Next-Q4_K_M`	`qwen3next`	not supported by AMD `b7146`
Vision	`Qwen3.5-9B-Q4_K_M`	`qwen35`	not supported by AMD `b7146`
Embeddings	`Qwen3-Embedding-0.6B-Q8_0`	`qwen3`	architecture recognized

Because of that, this benchmark used the closest models that the AMD b7146 bundle should support:

Lane	Compatible benchmark model	Architecture
Coder	`lucataco/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M-GGUF`	`qwen3moe`
Vision	`Qwen/Qwen3-VL-8B-Instruct-GGUF`, `Q4_K_M` + `mmproj-F16`	`qwen3vl`
Embeddings	existing `Qwen3-Embedding-0.6B-Q8_0`	`qwen3`

This means the benchmark answers a narrow question:

Can the AMD ROCm prebuilt package run compatible Qwen-family models on this Fedora Strix Halo server better than the current Vulkan runtime?

It does not judge ROCm as a technology in general, and it does not judge a current-source HIP build.

Test Workflow¶

flowchart TD
    A["Download ROCm-compatible benchmark models"] --> B["Build matching Vulkan benchmark helpers"]
    B --> C["Stop resident model services only"]
    C --> D["Run Vulkan/RADV llama.cpp benchmarks"]
    D --> E["Run AMD ROCm/HIP prebuilt benchmarks"]
    E --> F["Restart original model services"]
    F --> G["Verify health checks"]
    G --> H["Write JSON, Markdown summary, and wiki findings"]

Operational details:

Host: local-server-adeel
OS: Fedora Linux 44 Workstation
GPU: AMD Radeon 8060S Graphics, Strix Halo / gfx1151
BIOS GPU memory allocation observed by ROCm: 98304 MiB
Live Vulkan build: /srv/localai/llama.cpp/build/bin, llama.cpp b9371
AMD ROCm test bundle: /srv/localai/rocm-test/llama-b7146-rocm/..., llama.cpp b7146
Benchmark model root: /srv/localai/rocm-test/models
Result directory: /srv/localai/rocm-test/benchmarks/20260531T172731Z
Benchmark harness: /srv/localai/rocm-test/benchmark_rocm_vulkan_compatible.py

The benchmark stopped only:

localai-qwen-coder
localai-qwen-vision
localai-embed

Then it restarted them and confirmed all model and RAG services were healthy.

Result Table¶

Lane	Vulkan/RADV result	AMD ROCm/HIP prebuilt result	Interpretation
Coder text, `qwen3moe` 30B A3B Q4	prompt eval `1299.44 t/s`; generation `95.84 t/s`	failed, `SIGSEGV` / rc `-11`	Vulkan is usable; AMD prebuilt is not stable on Fedora
Vision text-only, `qwen3vl` 8B Q4	prompt eval `1315.63 t/s`; generation `42.76 t/s`	failed, `SIGSEGV` / rc `-11`	Vulkan is usable; AMD prebuilt crashes
Embedding bench, `qwen3` 0.6B Q8	prompt eval `8965.55 t/s`	failed, `SIGSEGV` / rc `-11`	Even a supported small Qwen3 model crashes
Full vision image, `qwen3vl` 8B Q4 + mmproj	completed in `4.845 s`	failed, `SIGSEGV` / rc `-11`	Full multimodal ROCm path is unusable via this prebuilt

Additional CPU-only probe:

AMD ROCm bundle with qwen3vl, -ngl 0: recognized general.architecture = qwen3vl, then segfaulted.
AMD ROCm bundle with qwen3 embedding, -ngl 0: recognized general.architecture = qwen3, then segfaulted.

That matters because -ngl 0 offloads zero model layers. The package still initializes the ROCm runtime and then crashes. This points to the AMD prebuilt runtime/platform combination, not simply to model architecture support or GPU memory size.

What This Means¶

The current Vulkan/RADV runtime is the only proven path on Fedora right now.

The AMD package is not useless as evidence: it proves ROCm can enumerate the Strix Halo GPU and see the 96 GiB allocation. But it is not a deployable runtime on this Fedora install because it crashes on models it should understand.

For RapidDraft Agent and BOM/vision work:

keep local/qwen-coder, local/qwen-vision-fast, and local/embed-engineering on the current Vulkan services
keep vision advisory and artifact-based, not a source of BOM truth
do not downgrade production models just to fit AMD b7146
only revisit ROCm after a current llama.cpp HIP build or Ubuntu 24.04 ROCm test passes the same benchmark set

Repeatable Commands¶

Check AMD bundle device visibility:

ROCM=/srv/localai/rocm-test/llama-b7146-rocm/llama-b7146-ubuntu-24.04-rocm-7.1.1-gfx1150-gfx1151-x64
LD_LIBRARY_PATH="$ROCM:$ROCM/lib:$ROCM/rocblas/lib:$ROCM/hip/lib:$ROCM/rocm/lib" \
  "$ROCM/llama-cli" --list-devices

Run the comparison harness:

python3 /srv/localai/rocm-test/benchmark_rocm_vulkan_compatible.py \
  --manage-services \
  --timeout 2400

Verify the live stack after benchmarking:

systemctl is-active localai-qwen-coder localai-qwen-vision localai-embed localai-litellm localai-rag-api postgresql smb
curl -fsS http://127.0.0.1:8010/health
curl -fsS http://127.0.0.1:8011/health
curl -fsS http://127.0.0.1:8012/health
curl -fsS http://127.0.0.1:4100/health

Raw Artifacts¶

Summary: /srv/localai/rocm-test/benchmarks/20260531T172731Z/summary.md
JSON: /srv/localai/rocm-test/benchmarks/20260531T172731Z/benchmark.json
Benchmark image: /srv/localai/rocm-test/benchmarks/20260531T172731Z/cad_bracket_benchmark.png
CPU-only ROCm probe log: /srv/localai/rocm-test/logs/rocm-cpu-only-probe-2026-05-31T19-29-20+02-00.log