Local AI Server¶

Status: Active — internal platform Last updated: 2026-06-02 Host: local-server-adeel (Fedora)

Private inference and RAG on the Fedora server: LiteLLM (OpenAI-compatible API over Tailscale and backend-only Cloudflare Tunnel), loopback-only llama.cpp backends, and local PostgreSQL + pgvector for document search.

Do not paste secrets into the wiki. Runtime secrets live in /etc/localai/localai.env; this wiki documents paths, variable names, and non-secret configuration shape only.

Open first¶

Step	Page	Why
1	Overview and Architecture	How the stack fits together
2	Service Inventory and Port Map	Ports, services, health checks
3	API Usage Guide	Call LiteLLM from clients
4	Operations Runbook	Day-2 ops and troubleshooting

Current server summary¶

Item	Value
Host	`local-server-adeel`
OS	Fedora Linux 44 Workstation
Hardware	AMD Ryzen AI Max+ 395 with Radeon 8060S
Local AI root	`/srv/localai`
Runtime config	`/etc/localai/localai.env`, `/etc/localai/litellm.yaml`
llama.cpp build	`/srv/localai/llama.cpp` (Vulkan)
ROCm benchmark status	AMD prebuilt detects GPU but segfaults on Fedora; keep Vulkan
Active vision model	`Qwen3-VL-8B-Instruct-Q4_K_M` behind `local/qwen-vision-fast`
RAG deployment	`/srv/localai/rag`
Client-facing API	LiteLLM on `0.0.0.0:4000`
Local RAG API / chat UI	`127.0.0.1:4100` with bearer auth on protected routes
Cloudflare Knowledge endpoint	`https://knowledge.rapiddraft.ai`
Cloudflare model endpoint	`https://localai.rapiddraft.ai/v1`
Database	PostgreSQL 18 + pgvector (`localai_rag`)

Verified state (2026-05-28)¶

Enabled services:

localai-qwen-coder.service
localai-qwen-vision.service
localai-embed.service
localai-litellm.service
localai-rag-api.service
localai-rag-ingest.timer
postgresql.service

Health checks confirmed:

llama.cpp on 8010–8012
authenticated LiteLLM /health
RAG /health returning {"ok": true, "auth_enabled": true}
protected RAG routes returning 401 without a token, 403 for a wrong token, and 200 with the configured bearer key
localai_rag schema with vector / pg_trgm extensions
post-reboot service recovery without requiring a desktop login
recursive nested-folder ingestion with relative archive paths preserved

First real corpus validation completed on 2026-05-28:

9 TextCAD standards PDFs indexed successfully under textcad/01_Standards/...
21 CVAT job 75 images indexed successfully under cvat/job_75/...
1 standards PDF failed ingestion: textcad/01_Standards/DS ISO 1101.pdf

Cloudflare/Railway prep completed on 2026-06-02:

knowledge.rapiddraft.ai routes to the Fedora RAG/Knowledge API on 127.0.0.1:4100.
localai.rapiddraft.ai routes to Fedora LiteLLM on 127.0.0.1:4000.
Protected routes remain bearer-authenticated.
External smoke checks pass through /srv/localai/bin/validate-cloudflare-localai.sh.

Current vs reference¶

Current: runtime layout, ports, model catalog, RAG ingestion/search, API usage, runbook
Reference: Local AI Model Selection, RAG Deployment Plan (Railway / future split)

Section map¶

Runtime¶

Models and API¶

RAG¶

Operations¶

Frontend and Adoption¶

RapidDraft Agent and Local AI Server Architecture

Product Wiki — RapidDraft product and infrastructure research
Forward Engineering Wiki — meeting briefings and technical notes

Sources¶

_sources/2026-05-28_local_ai_server/ — ingested markdown from local server documentation
/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/HANDOFF.md
/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/LOCALAI_SERVER_PLAN.md