Skip to content

Local AI Server

Status: Active — internal platform Last updated: 2026-06-02 Host: local-server-adeel (Fedora)

Private inference and RAG on the Fedora server: LiteLLM (OpenAI-compatible API over Tailscale and backend-only Cloudflare Tunnel), loopback-only llama.cpp backends, and local PostgreSQL + pgvector for document search.

Do not paste secrets into the wiki. Runtime secrets live in /etc/localai/localai.env; this wiki documents paths, variable names, and non-secret configuration shape only.


Open first

Step Page Why
1 Overview and Architecture How the stack fits together
2 Service Inventory and Port Map Ports, services, health checks
3 API Usage Guide Call LiteLLM from clients
4 Operations Runbook Day-2 ops and troubleshooting

Current server summary

Item Value
Host local-server-adeel
OS Fedora Linux 44 Workstation
Hardware AMD Ryzen AI Max+ 395 with Radeon 8060S
Local AI root /srv/localai
Runtime config /etc/localai/localai.env, /etc/localai/litellm.yaml
llama.cpp build /srv/localai/llama.cpp (Vulkan)
ROCm benchmark status AMD prebuilt detects GPU but segfaults on Fedora; keep Vulkan
Active vision model Qwen3-VL-8B-Instruct-Q4_K_M behind local/qwen-vision-fast
RAG deployment /srv/localai/rag
Client-facing API LiteLLM on 0.0.0.0:4000
Local RAG API / chat UI 127.0.0.1:4100 with bearer auth on protected routes
Cloudflare Knowledge endpoint https://knowledge.rapiddraft.ai
Cloudflare model endpoint https://localai.rapiddraft.ai/v1
Database PostgreSQL 18 + pgvector (localai_rag)

Verified state (2026-05-28)

Enabled services:

localai-qwen-coder.service
localai-qwen-vision.service
localai-embed.service
localai-litellm.service
localai-rag-api.service
localai-rag-ingest.timer
postgresql.service

Health checks confirmed:

  • llama.cpp on 80108012
  • authenticated LiteLLM /health
  • RAG /health returning {"ok": true, "auth_enabled": true}
  • protected RAG routes returning 401 without a token, 403 for a wrong token, and 200 with the configured bearer key
  • localai_rag schema with vector / pg_trgm extensions
  • post-reboot service recovery without requiring a desktop login
  • recursive nested-folder ingestion with relative archive paths preserved

First real corpus validation completed on 2026-05-28:

  • 9 TextCAD standards PDFs indexed successfully under textcad/01_Standards/...
  • 21 CVAT job 75 images indexed successfully under cvat/job_75/...
  • 1 standards PDF failed ingestion: textcad/01_Standards/DS ISO 1101.pdf

Cloudflare/Railway prep completed on 2026-06-02:

  • knowledge.rapiddraft.ai routes to the Fedora RAG/Knowledge API on 127.0.0.1:4100.
  • localai.rapiddraft.ai routes to Fedora LiteLLM on 127.0.0.1:4000.
  • Protected routes remain bearer-authenticated.
  • External smoke checks pass through /srv/localai/bin/validate-cloudflare-localai.sh.

Current vs reference


Section map

Runtime

Models and API

RAG

Operations

Frontend and Adoption



Sources

  • _sources/2026-05-28_local_ai_server/ — ingested markdown from local server documentation
  • /Users/adeelyj/code/local ai server setup/local-ai-stack-repo/HANDOFF.md
  • /Users/adeelyj/code/local ai server setup/local-ai-stack-repo/LOCALAI_SERVER_PLAN.md