Operations Runbook¶

Last updated: 2026-06-02

Quick Status¶

Check all key services:

systemctl is-active \
  localai-qwen-coder.service \
  localai-qwen-vision.service \
  localai-embed.service \
  localai-litellm.service \
  localai-rag-api.service \
  localai-rag-ingest.timer \
  postgresql.service \
  cloudflared-rapiddraft-localai.service

Check enablement:

systemctl is-enabled \
  localai-qwen-coder.service \
  localai-qwen-vision.service \
  localai-embed.service \
  localai-litellm.service \
  localai-rag-api.service \
  localai-rag-ingest.timer \
  postgresql.service \
  cloudflared-rapiddraft-localai.service

Check ports:

ss -ltnp 'sport = :8010 or sport = :8011 or sport = :8012 or sport = :4000 or sport = :4100 or sport = :5432'

Expected bind addresses:

0.0.0.0:4000      LiteLLM
127.0.0.1:4100    RAG API/chat UI
127.0.0.1:8010    coder backend
127.0.0.1:8011    vision backend
127.0.0.1:8012    embedding backend
127.0.0.1:5432    PostgreSQL
::1:5432          PostgreSQL

Start, Stop, Restart¶

Start the full stack:

sudo systemctl start postgresql.service
sudo systemctl start localai-qwen-coder.service localai-qwen-vision.service localai-embed.service
sudo systemctl start localai-litellm.service
sudo systemctl start localai-rag-api.service localai-rag-ingest.timer

Stop the AI services but leave PostgreSQL running:

sudo systemctl stop localai-rag-ingest.timer localai-rag-api.service
sudo systemctl stop localai-litellm.service
sudo systemctl stop localai-qwen-coder.service localai-qwen-vision.service localai-embed.service

Restart after config or model-service changes:

sudo systemctl daemon-reload
sudo systemctl restart localai-qwen-coder.service localai-qwen-vision.service localai-embed.service
sudo systemctl restart localai-litellm.service
sudo systemctl restart localai-rag-api.service

Restart only LiteLLM after /etc/localai/litellm.yaml changes:

sudo systemctl restart localai-litellm.service

Restart only the RAG API after /srv/localai/rag changes:

sudo systemctl restart localai-rag-api.service

Run ingestion once:

sudo systemctl start localai-rag-ingest.service

Restart the Cloudflare Tunnel:

sudo systemctl restart cloudflared-rapiddraft-localai.service

Logs¶

Follow LiteLLM logs:

journalctl -u localai-litellm.service -f

Follow backend logs:

journalctl -u localai-qwen-coder.service -f
journalctl -u localai-qwen-vision.service -f
journalctl -u localai-embed.service -f

Follow RAG logs:

journalctl -u localai-rag-api.service -f
journalctl -u localai-rag-ingest.service -f

Follow Cloudflare Tunnel logs:

journalctl -u cloudflared-rapiddraft-localai.service -f

Show recent failures:

journalctl -u localai-qwen-coder.service -u localai-qwen-vision.service -u localai-embed.service -u localai-litellm.service -u localai-rag-api.service --since "30 minutes ago" --no-pager

Health Checks¶

Raw backend checks:

curl -fsS http://127.0.0.1:8010/health
curl -fsS http://127.0.0.1:8011/health
curl -fsS http://127.0.0.1:8012/health

Expected response for each:

{"status": "ok"}

RAG health:

curl -fsS http://127.0.0.1:4100/health

Expected response:

{"ok": true, "auth_enabled": true}

Protected-route auth check:

sudo bash -c '
  set -a
  source /etc/localai/localai.env
  set +a
  curl -s -o /dev/null -w "%{http_code}\n" http://127.0.0.1:4100/documents
  curl -s -o /dev/null -w "%{http_code}\n" \
    -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
    http://127.0.0.1:4100/documents
'

Expected status codes:

401
200

Authenticated LiteLLM health:

sudo bash -c '
  set -a
  source /etc/localai/localai.env
  set +a
  TOKEN="${LITELLM_API_KEY:-${LITELLM_MASTER_KEY:-}}"
  curl -fsS -H "Authorization: Bearer ${TOKEN}" http://127.0.0.1:4000/health
'

Cloudflare endpoint smoke:

sudo /srv/localai/bin/validate-cloudflare-localai.sh

Expected result:

knowledge health public -> 200
knowledge inventory no auth -> 401
knowledge inventory auth -> 200
localai health no auth -> 401
localai health auth -> 200
localai models auth -> 200
Cloudflare local-AI endpoint smoke passed.

Expected summary:

healthy_count: 3
unhealthy_count: 0

Database Checks¶

Check PostgreSQL:

systemctl is-active postgresql.service

Check database extensions:

sudo -u postgres psql -d localai_rag -At -c "SELECT extname FROM pg_extension WHERE extname IN ('vector','pg_trgm') ORDER BY extname;"

Expected output:

pg_trgm
vector

Check RAG tables:

sudo -u postgres psql -d localai_rag -At -c "SELECT to_regclass('public.rag_documents'), to_regclass('public.rag_chunks');"

Expected output:

rag_documents|rag_chunks

Count documents:

sudo -u postgres psql -d localai_rag -At -c "SELECT count(*) FROM rag_documents;"

Ingestion Operations¶

Check timer:

systemctl list-timers localai-rag-ingest.timer --no-pager

Trigger API ingestion:

sudo bash -c '
  set -a
  source /etc/localai/localai.env
  set +a
  curl -fsS -X POST http://127.0.0.1:4100/documents/ingest \
    -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"
'

Check ingestion status:

sudo bash -c '
  set -a
  source /etc/localai/localai.env
  set +a
  curl -fsS http://127.0.0.1:4100/ingestion/status \
    -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"
'

List documents:

sudo bash -c '
  set -a
  source /etc/localai/localai.env
  set +a
  curl -fsS http://127.0.0.1:4100/documents \
    -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"
'

If a file fails ingestion:

Check /srv/localai/documents/failed.
Check last_error from /documents or the database.
Check journalctl -u localai-rag-ingest.service --since "30 minutes ago".
Fix the source file or parser dependency.
Move the corrected file back to /srv/localai/documents/inbox.

Reboot Verification¶

After reboot, verify this stack before relying on desktop login:

hostnamectl
tailscale status
tailscale ip -4
systemctl is-active postgresql.service
systemctl is-active localai-qwen-coder.service localai-qwen-vision.service localai-embed.service
systemctl is-active localai-litellm.service localai-rag-api.service localai-rag-ingest.timer
ss -ltnp 'sport = :8010 or sport = :8011 or sport = :8012 or sport = :4000 or sport = :4100 or sport = :5432'
curl -fsS http://127.0.0.1:8010/health
curl -fsS http://127.0.0.1:8011/health
curl -fsS http://127.0.0.1:8012/health
curl -fsS http://127.0.0.1:4100/health

Then run authenticated LiteLLM health:

sudo bash -c '
  set -a
  source /etc/localai/localai.env
  set +a
  TOKEN="${LITELLM_API_KEY:-${LITELLM_MASTER_KEY:-}}"
  curl -fsS -H "Authorization: Bearer ${TOKEN}" http://127.0.0.1:4000/health
'

Finally, verify LiteLLM from another Tailscale-connected device using that device's configured secret value.

The live reboot validation already passed on 2026-05-28: services recovered after reboot without requiring a desktop login, LiteLLM remained reachable over Tailscale, and the protected RAG flow still returned 401 without a key and 200 with the configured key.

Common Failure Modes¶

LiteLLM Returns 401¶

Cause: request is missing bearer authentication.

Fix: provide Authorization: Bearer ... from the configured secret. Do not paste the secret into docs or shell transcripts.

RAG API Is Up but Chat Fails¶

Likely causes:

LiteLLM is down or unhealthy.
LITELLM_BASE_URL is wrong.
the configured LiteLLM key is invalid.
the selected model service is down.
too many retrieved source chunks were pulled into a single prompt.

Checks:

systemctl is-active localai-litellm.service
curl -fsS http://127.0.0.1:4100/health

Then run the authenticated LiteLLM health command from this page.

If the UI shows a context-related failure for a broad summarization prompt, narrow the question to a specific standard, file, or topic and retry.

RAG API Returns 401 or 403¶

Cause:

401: missing bearer token
403: wrong bearer token

Fix:

load LOCALAI_RAG_API_KEY from /etc/localai/localai.env
retry the request with Authorization: Bearer ...
if browser testing on Fedora, paste the key into the RAG API key field and click Use Key

Ingestion Does Not Pick Up Files¶

Likely causes:

files are not in /srv/localai/documents/inbox
ownership or permissions prevent the localai user from reading files
the timer is inactive
LiteLLM or the embedding backend is down

Checks:

ls -la /srv/localai/documents/inbox
systemctl is-active localai-rag-ingest.timer
systemctl list-timers localai-rag-ingest.timer --no-pager
journalctl -u localai-rag-ingest.service --since "30 minutes ago" --no-pager

Fix file ownership:

sudo chown -R localai:localai /srv/localai/documents/inbox

Service Fails Only Under systemd¶

Check environment-file permissions and SELinux:

sudo ls -la /etc/localai
sudo restorecon -Rv /srv/localai /etc/localai
sudo ausearch -m avc -ts recent

Backend Starts Slowly¶

Large GGUF models can take time to load. The unit files have extended TimeoutStartSec values. Use journal logs to distinguish slow startup from failure.

Change Management¶

After editing service units:

sudo systemctl daemon-reload
sudo systemctl restart <unit>

After editing /etc/localai/localai.env:

sudo systemctl restart localai-litellm.service localai-rag-api.service localai-rag-ingest.service

Restart model services too if the changed environment affects llama.cpp units.

After editing /etc/localai/litellm.yaml:

sudo systemctl restart localai-litellm.service

After editing RAG code under /srv/localai/rag:

sudo systemctl restart localai-rag-api.service
sudo systemctl start localai-rag-ingest.service

Exposure Policy¶

Current exposure:

LiteLLM is bound to all interfaces for private Tailscale use.
Raw llama.cpp backends are localhost-only.
RAG API/chat UI is localhost-only and bearer-protected on protected routes.
PostgreSQL is localhost-only.

Before exposing the RAG API to any non-local client, add:

HTTPS proxying through Caddy, Cloudflare Tunnel, or another approved path
request size limits for uploads
logging and failure monitoring
an explicit decision about whether the Railway app can call RAG directly

Sources¶

/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/HANDOFF.md
/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/rag/app/app.py
/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/rag/app/ingest.py