Skip to content

ROCm vs Vulkan Runtime Benchmark

Last updated: 2026-05-31

Executive Decision

Keep the live Fedora local AI stack on the current Vulkan/RADV llama.cpp runtime.

The AMD ROCm prebuilt runtime for Strix/Strix Halo detects the Radeon 8060S and reports the 96 GiB GPU memory allocation, but it segfaults on Fedora when loading ROCm-compatible Qwen models. It also segfaults with -ngl 0, so the failure is not only GPU layer offload. Treat the AMD prebuilt as an Ubuntu-targeted package, not a reliable Fedora runtime.

Do not switch production LiteLLM aliases to AMD ROCm from this test. The next ROCm path is either:

  1. build the current llama.cpp revision with HIP/ROCm on Fedora, or
  2. test the AMD package on a vendor-aligned Ubuntu 24.04 boot/install profile.

Why This Benchmark Was Needed

The live production models are newer than the AMD ROCm prebuilt runtime:

Production lane Current model GGUF architecture AMD prebuilt support
Coder Qwen3-Coder-Next-Q4_K_M qwen3next not supported by AMD b7146
Vision Qwen3.5-9B-Q4_K_M qwen35 not supported by AMD b7146
Embeddings Qwen3-Embedding-0.6B-Q8_0 qwen3 architecture recognized

Because of that, this benchmark used the closest models that the AMD b7146 bundle should support:

Lane Compatible benchmark model Architecture
Coder lucataco/Qwen3-Coder-30B-A3B-Instruct-Q4_K_M-GGUF qwen3moe
Vision Qwen/Qwen3-VL-8B-Instruct-GGUF, Q4_K_M + mmproj-F16 qwen3vl
Embeddings existing Qwen3-Embedding-0.6B-Q8_0 qwen3

This means the benchmark answers a narrow question:

Can the AMD ROCm prebuilt package run compatible Qwen-family models on this Fedora Strix Halo server better than the current Vulkan runtime?

It does not judge ROCm as a technology in general, and it does not judge a current-source HIP build.

Test Workflow

flowchart TD
    A["Download ROCm-compatible benchmark models"] --> B["Build matching Vulkan benchmark helpers"]
    B --> C["Stop resident model services only"]
    C --> D["Run Vulkan/RADV llama.cpp benchmarks"]
    D --> E["Run AMD ROCm/HIP prebuilt benchmarks"]
    E --> F["Restart original model services"]
    F --> G["Verify health checks"]
    G --> H["Write JSON, Markdown summary, and wiki findings"]

Operational details:

  • Host: local-server-adeel
  • OS: Fedora Linux 44 Workstation
  • GPU: AMD Radeon 8060S Graphics, Strix Halo / gfx1151
  • BIOS GPU memory allocation observed by ROCm: 98304 MiB
  • Live Vulkan build: /srv/localai/llama.cpp/build/bin, llama.cpp b9371
  • AMD ROCm test bundle: /srv/localai/rocm-test/llama-b7146-rocm/..., llama.cpp b7146
  • Benchmark model root: /srv/localai/rocm-test/models
  • Result directory: /srv/localai/rocm-test/benchmarks/20260531T172731Z
  • Benchmark harness: /srv/localai/rocm-test/benchmark_rocm_vulkan_compatible.py

The benchmark stopped only:

localai-qwen-coder
localai-qwen-vision
localai-embed

Then it restarted them and confirmed all model and RAG services were healthy.

Result Table

Lane Vulkan/RADV result AMD ROCm/HIP prebuilt result Interpretation
Coder text, qwen3moe 30B A3B Q4 prompt eval 1299.44 t/s; generation 95.84 t/s failed, SIGSEGV / rc -11 Vulkan is usable; AMD prebuilt is not stable on Fedora
Vision text-only, qwen3vl 8B Q4 prompt eval 1315.63 t/s; generation 42.76 t/s failed, SIGSEGV / rc -11 Vulkan is usable; AMD prebuilt crashes
Embedding bench, qwen3 0.6B Q8 prompt eval 8965.55 t/s failed, SIGSEGV / rc -11 Even a supported small Qwen3 model crashes
Full vision image, qwen3vl 8B Q4 + mmproj completed in 4.845 s failed, SIGSEGV / rc -11 Full multimodal ROCm path is unusable via this prebuilt

Additional CPU-only probe:

  • AMD ROCm bundle with qwen3vl, -ngl 0: recognized general.architecture = qwen3vl, then segfaulted.
  • AMD ROCm bundle with qwen3 embedding, -ngl 0: recognized general.architecture = qwen3, then segfaulted.

That matters because -ngl 0 offloads zero model layers. The package still initializes the ROCm runtime and then crashes. This points to the AMD prebuilt runtime/platform combination, not simply to model architecture support or GPU memory size.

What This Means

The current Vulkan/RADV runtime is the only proven path on Fedora right now.

The AMD package is not useless as evidence: it proves ROCm can enumerate the Strix Halo GPU and see the 96 GiB allocation. But it is not a deployable runtime on this Fedora install because it crashes on models it should understand.

For RapidDraft Agent and BOM/vision work:

  • keep local/qwen-coder, local/qwen-vision-fast, and local/embed-engineering on the current Vulkan services
  • keep vision advisory and artifact-based, not a source of BOM truth
  • do not downgrade production models just to fit AMD b7146
  • only revisit ROCm after a current llama.cpp HIP build or Ubuntu 24.04 ROCm test passes the same benchmark set

Repeatable Commands

Check AMD bundle device visibility:

ROCM=/srv/localai/rocm-test/llama-b7146-rocm/llama-b7146-ubuntu-24.04-rocm-7.1.1-gfx1150-gfx1151-x64
LD_LIBRARY_PATH="$ROCM:$ROCM/lib:$ROCM/rocblas/lib:$ROCM/hip/lib:$ROCM/rocm/lib" \
  "$ROCM/llama-cli" --list-devices

Run the comparison harness:

python3 /srv/localai/rocm-test/benchmark_rocm_vulkan_compatible.py \
  --manage-services \
  --timeout 2400

Verify the live stack after benchmarking:

systemctl is-active localai-qwen-coder localai-qwen-vision localai-embed localai-litellm localai-rag-api postgresql smb
curl -fsS http://127.0.0.1:8010/health
curl -fsS http://127.0.0.1:8011/health
curl -fsS http://127.0.0.1:8012/health
curl -fsS http://127.0.0.1:4100/health

Raw Artifacts

  • Summary: /srv/localai/rocm-test/benchmarks/20260531T172731Z/summary.md
  • JSON: /srv/localai/rocm-test/benchmarks/20260531T172731Z/benchmark.json
  • Benchmark image: /srv/localai/rocm-test/benchmarks/20260531T172731Z/cad_bracket_benchmark.png
  • CPU-only ROCm probe log: /srv/localai/rocm-test/logs/rocm-cpu-only-probe-2026-05-31T19-29-20+02-00.log