Local AI Server¶
Status: Active — internal platform Last updated: 2026-06-02 Host:
local-server-adeel(Fedora)
Private inference and RAG on the Fedora server: LiteLLM (OpenAI-compatible API over Tailscale and backend-only Cloudflare Tunnel), loopback-only llama.cpp backends, and local PostgreSQL + pgvector for document search.
Do not paste secrets into the wiki. Runtime secrets live in /etc/localai/localai.env; this wiki documents paths, variable names, and non-secret configuration shape only.
Open first¶
| Step | Page | Why |
|---|---|---|
| 1 | Overview and Architecture | How the stack fits together |
| 2 | Service Inventory and Port Map | Ports, services, health checks |
| 3 | API Usage Guide | Call LiteLLM from clients |
| 4 | Operations Runbook | Day-2 ops and troubleshooting |
Current server summary¶
| Item | Value |
|---|---|
| Host | local-server-adeel |
| OS | Fedora Linux 44 Workstation |
| Hardware | AMD Ryzen AI Max+ 395 with Radeon 8060S |
| Local AI root | /srv/localai |
| Runtime config | /etc/localai/localai.env, /etc/localai/litellm.yaml |
| llama.cpp build | /srv/localai/llama.cpp (Vulkan) |
| ROCm benchmark status | AMD prebuilt detects GPU but segfaults on Fedora; keep Vulkan |
| Active vision model | Qwen3-VL-8B-Instruct-Q4_K_M behind local/qwen-vision-fast |
| RAG deployment | /srv/localai/rag |
| Client-facing API | LiteLLM on 0.0.0.0:4000 |
| Local RAG API / chat UI | 127.0.0.1:4100 with bearer auth on protected routes |
| Cloudflare Knowledge endpoint | https://knowledge.rapiddraft.ai |
| Cloudflare model endpoint | https://localai.rapiddraft.ai/v1 |
| Database | PostgreSQL 18 + pgvector (localai_rag) |
Verified state (2026-05-28)¶
Enabled services:
localai-qwen-coder.service
localai-qwen-vision.service
localai-embed.service
localai-litellm.service
localai-rag-api.service
localai-rag-ingest.timer
postgresql.service
Health checks confirmed:
- llama.cpp on
8010–8012 - authenticated LiteLLM
/health - RAG
/healthreturning{"ok": true, "auth_enabled": true} - protected RAG routes returning
401without a token,403for a wrong token, and200with the configured bearer key localai_ragschema withvector/pg_trgmextensions- post-reboot service recovery without requiring a desktop login
- recursive nested-folder ingestion with relative archive paths preserved
First real corpus validation completed on 2026-05-28:
9TextCAD standards PDFs indexed successfully undertextcad/01_Standards/...21CVAT job 75 images indexed successfully undercvat/job_75/...1standards PDF failed ingestion:textcad/01_Standards/DS ISO 1101.pdf
Cloudflare/Railway prep completed on 2026-06-02:
knowledge.rapiddraft.airoutes to the Fedora RAG/Knowledge API on127.0.0.1:4100.localai.rapiddraft.airoutes to Fedora LiteLLM on127.0.0.1:4000.- Protected routes remain bearer-authenticated.
- External smoke checks pass through
/srv/localai/bin/validate-cloudflare-localai.sh.
Current vs reference¶
- Current: runtime layout, ports, model catalog, RAG ingestion/search, API usage, runbook
- Reference: Local AI Model Selection, RAG Deployment Plan (Railway / future split)
Section map¶
Runtime¶
Models and API¶
- Model Catalog and LiteLLM Aliases
- Local AI Model Selection
- ROCm vs Vulkan Runtime Benchmark
- Vision Model Quality Evaluation
- API Usage Guide
RAG¶
Operations¶
Frontend and Adoption¶
Related wikis¶
- Product Wiki — RapidDraft product and infrastructure research
- Forward Engineering Wiki — meeting briefings and technical notes
Sources¶
_sources/2026-05-28_local_ai_server/— ingested markdown from local server documentation/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/HANDOFF.md/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/LOCALAI_SERVER_PLAN.md