Operations Runbook¶
Last updated: 2026-06-02
Quick Status¶
Check all key services:
systemctl is-active \
localai-qwen-coder.service \
localai-qwen-vision.service \
localai-embed.service \
localai-litellm.service \
localai-rag-api.service \
localai-rag-ingest.timer \
postgresql.service \
cloudflared-rapiddraft-localai.service
Check enablement:
systemctl is-enabled \
localai-qwen-coder.service \
localai-qwen-vision.service \
localai-embed.service \
localai-litellm.service \
localai-rag-api.service \
localai-rag-ingest.timer \
postgresql.service \
cloudflared-rapiddraft-localai.service
Check ports:
ss -ltnp 'sport = :8010 or sport = :8011 or sport = :8012 or sport = :4000 or sport = :4100 or sport = :5432'
Expected bind addresses:
0.0.0.0:4000 LiteLLM
127.0.0.1:4100 RAG API/chat UI
127.0.0.1:8010 coder backend
127.0.0.1:8011 vision backend
127.0.0.1:8012 embedding backend
127.0.0.1:5432 PostgreSQL
::1:5432 PostgreSQL
Start, Stop, Restart¶
Start the full stack:
sudo systemctl start postgresql.service
sudo systemctl start localai-qwen-coder.service localai-qwen-vision.service localai-embed.service
sudo systemctl start localai-litellm.service
sudo systemctl start localai-rag-api.service localai-rag-ingest.timer
Stop the AI services but leave PostgreSQL running:
sudo systemctl stop localai-rag-ingest.timer localai-rag-api.service
sudo systemctl stop localai-litellm.service
sudo systemctl stop localai-qwen-coder.service localai-qwen-vision.service localai-embed.service
Restart after config or model-service changes:
sudo systemctl daemon-reload
sudo systemctl restart localai-qwen-coder.service localai-qwen-vision.service localai-embed.service
sudo systemctl restart localai-litellm.service
sudo systemctl restart localai-rag-api.service
Restart only LiteLLM after /etc/localai/litellm.yaml changes:
sudo systemctl restart localai-litellm.service
Restart only the RAG API after /srv/localai/rag changes:
sudo systemctl restart localai-rag-api.service
Run ingestion once:
sudo systemctl start localai-rag-ingest.service
Restart the Cloudflare Tunnel:
sudo systemctl restart cloudflared-rapiddraft-localai.service
Logs¶
Follow LiteLLM logs:
journalctl -u localai-litellm.service -f
Follow backend logs:
journalctl -u localai-qwen-coder.service -f
journalctl -u localai-qwen-vision.service -f
journalctl -u localai-embed.service -f
Follow RAG logs:
journalctl -u localai-rag-api.service -f
journalctl -u localai-rag-ingest.service -f
Follow Cloudflare Tunnel logs:
journalctl -u cloudflared-rapiddraft-localai.service -f
Show recent failures:
journalctl -u localai-qwen-coder.service -u localai-qwen-vision.service -u localai-embed.service -u localai-litellm.service -u localai-rag-api.service --since "30 minutes ago" --no-pager
Health Checks¶
Raw backend checks:
curl -fsS http://127.0.0.1:8010/health
curl -fsS http://127.0.0.1:8011/health
curl -fsS http://127.0.0.1:8012/health
Expected response for each:
{"status": "ok"}
RAG health:
curl -fsS http://127.0.0.1:4100/health
Expected response:
{"ok": true, "auth_enabled": true}
Protected-route auth check:
sudo bash -c '
set -a
source /etc/localai/localai.env
set +a
curl -s -o /dev/null -w "%{http_code}\n" http://127.0.0.1:4100/documents
curl -s -o /dev/null -w "%{http_code}\n" \
-H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
http://127.0.0.1:4100/documents
'
Expected status codes:
401
200
Authenticated LiteLLM health:
sudo bash -c '
set -a
source /etc/localai/localai.env
set +a
TOKEN="${LITELLM_API_KEY:-${LITELLM_MASTER_KEY:-}}"
curl -fsS -H "Authorization: Bearer ${TOKEN}" http://127.0.0.1:4000/health
'
Cloudflare endpoint smoke:
sudo /srv/localai/bin/validate-cloudflare-localai.sh
Expected result:
knowledge health public -> 200
knowledge inventory no auth -> 401
knowledge inventory auth -> 200
localai health no auth -> 401
localai health auth -> 200
localai models auth -> 200
Cloudflare local-AI endpoint smoke passed.
Expected summary:
healthy_count: 3
unhealthy_count: 0
Database Checks¶
Check PostgreSQL:
systemctl is-active postgresql.service
Check database extensions:
sudo -u postgres psql -d localai_rag -At -c "SELECT extname FROM pg_extension WHERE extname IN ('vector','pg_trgm') ORDER BY extname;"
Expected output:
pg_trgm
vector
Check RAG tables:
sudo -u postgres psql -d localai_rag -At -c "SELECT to_regclass('public.rag_documents'), to_regclass('public.rag_chunks');"
Expected output:
rag_documents|rag_chunks
Count documents:
sudo -u postgres psql -d localai_rag -At -c "SELECT count(*) FROM rag_documents;"
Ingestion Operations¶
Check timer:
systemctl list-timers localai-rag-ingest.timer --no-pager
Trigger API ingestion:
sudo bash -c '
set -a
source /etc/localai/localai.env
set +a
curl -fsS -X POST http://127.0.0.1:4100/documents/ingest \
-H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"
'
Check ingestion status:
sudo bash -c '
set -a
source /etc/localai/localai.env
set +a
curl -fsS http://127.0.0.1:4100/ingestion/status \
-H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"
'
List documents:
sudo bash -c '
set -a
source /etc/localai/localai.env
set +a
curl -fsS http://127.0.0.1:4100/documents \
-H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"
'
If a file fails ingestion:
- Check
/srv/localai/documents/failed. - Check
last_errorfrom/documentsor the database. - Check
journalctl -u localai-rag-ingest.service --since "30 minutes ago". - Fix the source file or parser dependency.
- Move the corrected file back to
/srv/localai/documents/inbox.
Reboot Verification¶
After reboot, verify this stack before relying on desktop login:
hostnamectl
tailscale status
tailscale ip -4
systemctl is-active postgresql.service
systemctl is-active localai-qwen-coder.service localai-qwen-vision.service localai-embed.service
systemctl is-active localai-litellm.service localai-rag-api.service localai-rag-ingest.timer
ss -ltnp 'sport = :8010 or sport = :8011 or sport = :8012 or sport = :4000 or sport = :4100 or sport = :5432'
curl -fsS http://127.0.0.1:8010/health
curl -fsS http://127.0.0.1:8011/health
curl -fsS http://127.0.0.1:8012/health
curl -fsS http://127.0.0.1:4100/health
Then run authenticated LiteLLM health:
sudo bash -c '
set -a
source /etc/localai/localai.env
set +a
TOKEN="${LITELLM_API_KEY:-${LITELLM_MASTER_KEY:-}}"
curl -fsS -H "Authorization: Bearer ${TOKEN}" http://127.0.0.1:4000/health
'
Finally, verify LiteLLM from another Tailscale-connected device using that device's configured secret value.
The live reboot validation already passed on 2026-05-28: services recovered after reboot without requiring a desktop login, LiteLLM remained reachable over Tailscale, and the protected RAG flow still returned 401 without a key and 200 with the configured key.
Common Failure Modes¶
LiteLLM Returns 401¶
Cause: request is missing bearer authentication.
Fix: provide Authorization: Bearer ... from the configured secret. Do not paste the secret into docs or shell transcripts.
RAG API Is Up but Chat Fails¶
Likely causes:
- LiteLLM is down or unhealthy.
LITELLM_BASE_URLis wrong.- the configured LiteLLM key is invalid.
- the selected model service is down.
- too many retrieved source chunks were pulled into a single prompt.
Checks:
systemctl is-active localai-litellm.service
curl -fsS http://127.0.0.1:4100/health
Then run the authenticated LiteLLM health command from this page.
If the UI shows a context-related failure for a broad summarization prompt, narrow the question to a specific standard, file, or topic and retry.
RAG API Returns 401 or 403¶
Cause:
401: missing bearer token403: wrong bearer token
Fix:
- load
LOCALAI_RAG_API_KEYfrom/etc/localai/localai.env - retry the request with
Authorization: Bearer ... - if browser testing on Fedora, paste the key into the
RAG API keyfield and clickUse Key
Ingestion Does Not Pick Up Files¶
Likely causes:
- files are not in
/srv/localai/documents/inbox - ownership or permissions prevent the
localaiuser from reading files - the timer is inactive
- LiteLLM or the embedding backend is down
Checks:
ls -la /srv/localai/documents/inbox
systemctl is-active localai-rag-ingest.timer
systemctl list-timers localai-rag-ingest.timer --no-pager
journalctl -u localai-rag-ingest.service --since "30 minutes ago" --no-pager
Fix file ownership:
sudo chown -R localai:localai /srv/localai/documents/inbox
Service Fails Only Under systemd¶
Check environment-file permissions and SELinux:
sudo ls -la /etc/localai
sudo restorecon -Rv /srv/localai /etc/localai
sudo ausearch -m avc -ts recent
Backend Starts Slowly¶
Large GGUF models can take time to load. The unit files have extended TimeoutStartSec values. Use journal logs to distinguish slow startup from failure.
Change Management¶
After editing service units:
sudo systemctl daemon-reload
sudo systemctl restart <unit>
After editing /etc/localai/localai.env:
sudo systemctl restart localai-litellm.service localai-rag-api.service localai-rag-ingest.service
Restart model services too if the changed environment affects llama.cpp units.
After editing /etc/localai/litellm.yaml:
sudo systemctl restart localai-litellm.service
After editing RAG code under /srv/localai/rag:
sudo systemctl restart localai-rag-api.service
sudo systemctl start localai-rag-ingest.service
Exposure Policy¶
Current exposure:
- LiteLLM is bound to all interfaces for private Tailscale use.
- Raw llama.cpp backends are localhost-only.
- RAG API/chat UI is localhost-only and bearer-protected on protected routes.
- PostgreSQL is localhost-only.
Before exposing the RAG API to any non-local client, add:
- HTTPS proxying through Caddy, Cloudflare Tunnel, or another approved path
- request size limits for uploads
- logging and failure monitoring
- an explicit decision about whether the Railway app can call RAG directly
Sources¶
/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/HANDOFF.md/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/rag/app/app.py/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/rag/app/ingest.py