Service Inventory and Port Map¶
Last updated: 2026-05-31
Systemd Units¶
The local AI stack is managed by system-level systemd units.
| Unit | Purpose | Enabled state verified | Active state verified |
|---|---|---|---|
localai-qwen-coder.service |
llama.cpp server for local/qwen-coder |
yes | yes |
localai-qwen-vision.service |
llama.cpp server for local/qwen-vision-fast |
yes | yes |
localai-embed.service |
llama.cpp embeddings server for local/embed-engineering |
yes | yes |
localai-litellm.service |
LiteLLM OpenAI-compatible proxy | yes | yes |
localai-rag-api.service |
FastAPI RAG API and chat UI | yes | yes |
localai-rag-ingest.service |
one-shot RAG ingestion worker | timer-triggered | on demand |
localai-rag-ingest.timer |
runs ingestion every two minutes | yes | yes |
postgresql.service |
local PostgreSQL + pgvector database | yes | yes |
Port Map¶
| Component | Bind address | Port | External access | Notes |
|---|---|---|---|---|
| Qwen coder llama.cpp backend | 127.0.0.1 |
8010 |
no | OpenAI-compatible backend for LiteLLM |
| Qwen vision llama.cpp backend | 127.0.0.1 |
8011 |
no | Uses model GGUF plus mmproj-F16.gguf |
| Embedding llama.cpp backend | 127.0.0.1 |
8012 |
no | Embeddings only |
| LiteLLM proxy | 0.0.0.0 |
4000 |
private Tailscale and Cloudflare backend endpoint | Main client API |
| RAG API/chat UI | 127.0.0.1 |
4100 |
Cloudflare backend endpoint only | Local API/UI and Knowledge API |
| PostgreSQL | 127.0.0.1, ::1 |
5432 |
no | Local database only |
| Cloudflare Tunnel | outbound | n/a | knowledge.rapiddraft.ai, localai.rapiddraft.ai |
Routes only to RAG API and LiteLLM |
Verified listener summary:
0.0.0.0:4000 LiteLLM
127.0.0.1:4100 RAG API/chat UI
127.0.0.1:8010 local/qwen-coder backend
127.0.0.1:8011 local/qwen-vision-fast backend
127.0.0.1:8012 local/embed-engineering backend
127.0.0.1:5432 PostgreSQL
::1:5432 PostgreSQL
Cloudflare Tunnel is managed by cloudflared-rapiddraft-localai.service and does not require
opening inbound firewall ports.
Service Dependencies¶
localai-qwen-coder.service, localai-qwen-vision.service, and localai-embed.service start after network-online.target.
localai-litellm.service starts after:
network-online.target
localai-qwen-coder.service
localai-qwen-vision.service
localai-embed.service
localai-rag-api.service starts after:
network-online.target
postgresql.service
localai-litellm.service
localai-rag-ingest.service starts after:
postgresql.service
localai-litellm.service
Backend Process Settings¶
| Unit | Alias | Context | Parallel | GPU | Metrics | UI |
|---|---|---|---|---|---|---|
localai-qwen-coder.service |
local/qwen-coder |
32768 |
1 |
--gpu-layers auto |
enabled | disabled |
localai-qwen-vision.service |
local/qwen-vision-fast |
8192 |
1 |
--gpu-layers auto |
enabled | disabled |
localai-embed.service |
local/embed-engineering |
8192 |
2 |
--gpu-layers auto |
enabled | disabled |
The coder and vision services use --flash-attn auto. The coder and vision services use --jinja. The embedding service uses --embedding.
Timer Schedule¶
localai-rag-ingest.timer runs:
OnBootSec=2min
OnUnitActiveSec=2min
This means new files dropped into /srv/localai/documents/inbox are picked up within about two minutes, unless ingestion is triggered manually.
Inventory Commands¶
Check enabled state:
systemctl is-enabled \
localai-qwen-coder.service \
localai-qwen-vision.service \
localai-embed.service \
localai-litellm.service \
localai-rag-api.service \
localai-rag-ingest.timer \
postgresql.service
Check active state:
systemctl is-active \
localai-qwen-coder.service \
localai-qwen-vision.service \
localai-embed.service \
localai-litellm.service \
localai-rag-api.service \
localai-rag-ingest.timer \
postgresql.service
Check listeners:
ss -ltnp 'sport = :8010 or sport = :8011 or sport = :8012 or sport = :4000 or sport = :4100 or sport = :5432'
Check the ingestion timer:
systemctl list-timers localai-rag-ingest.timer --no-pager