Skip to content

Service Inventory and Port Map

Last updated: 2026-05-31

Systemd Units

The local AI stack is managed by system-level systemd units.

Unit Purpose Enabled state verified Active state verified
localai-qwen-coder.service llama.cpp server for local/qwen-coder yes yes
localai-qwen-vision.service llama.cpp server for local/qwen-vision-fast yes yes
localai-embed.service llama.cpp embeddings server for local/embed-engineering yes yes
localai-litellm.service LiteLLM OpenAI-compatible proxy yes yes
localai-rag-api.service FastAPI RAG API and chat UI yes yes
localai-rag-ingest.service one-shot RAG ingestion worker timer-triggered on demand
localai-rag-ingest.timer runs ingestion every two minutes yes yes
postgresql.service local PostgreSQL + pgvector database yes yes

Port Map

Component Bind address Port External access Notes
Qwen coder llama.cpp backend 127.0.0.1 8010 no OpenAI-compatible backend for LiteLLM
Qwen vision llama.cpp backend 127.0.0.1 8011 no Uses model GGUF plus mmproj-F16.gguf
Embedding llama.cpp backend 127.0.0.1 8012 no Embeddings only
LiteLLM proxy 0.0.0.0 4000 private Tailscale and Cloudflare backend endpoint Main client API
RAG API/chat UI 127.0.0.1 4100 Cloudflare backend endpoint only Local API/UI and Knowledge API
PostgreSQL 127.0.0.1, ::1 5432 no Local database only
Cloudflare Tunnel outbound n/a knowledge.rapiddraft.ai, localai.rapiddraft.ai Routes only to RAG API and LiteLLM

Verified listener summary:

0.0.0.0:4000      LiteLLM
127.0.0.1:4100    RAG API/chat UI
127.0.0.1:8010    local/qwen-coder backend
127.0.0.1:8011    local/qwen-vision-fast backend
127.0.0.1:8012    local/embed-engineering backend
127.0.0.1:5432    PostgreSQL
::1:5432          PostgreSQL

Cloudflare Tunnel is managed by cloudflared-rapiddraft-localai.service and does not require opening inbound firewall ports.

Service Dependencies

localai-qwen-coder.service, localai-qwen-vision.service, and localai-embed.service start after network-online.target.

localai-litellm.service starts after:

network-online.target
localai-qwen-coder.service
localai-qwen-vision.service
localai-embed.service

localai-rag-api.service starts after:

network-online.target
postgresql.service
localai-litellm.service

localai-rag-ingest.service starts after:

postgresql.service
localai-litellm.service

Backend Process Settings

Unit Alias Context Parallel GPU Metrics UI
localai-qwen-coder.service local/qwen-coder 32768 1 --gpu-layers auto enabled disabled
localai-qwen-vision.service local/qwen-vision-fast 8192 1 --gpu-layers auto enabled disabled
localai-embed.service local/embed-engineering 8192 2 --gpu-layers auto enabled disabled

The coder and vision services use --flash-attn auto. The coder and vision services use --jinja. The embedding service uses --embedding.

Timer Schedule

localai-rag-ingest.timer runs:

OnBootSec=2min
OnUnitActiveSec=2min

This means new files dropped into /srv/localai/documents/inbox are picked up within about two minutes, unless ingestion is triggered manually.

Inventory Commands

Check enabled state:

systemctl is-enabled \
  localai-qwen-coder.service \
  localai-qwen-vision.service \
  localai-embed.service \
  localai-litellm.service \
  localai-rag-api.service \
  localai-rag-ingest.timer \
  postgresql.service

Check active state:

systemctl is-active \
  localai-qwen-coder.service \
  localai-qwen-vision.service \
  localai-embed.service \
  localai-litellm.service \
  localai-rag-api.service \
  localai-rag-ingest.timer \
  postgresql.service

Check listeners:

ss -ltnp 'sport = :8010 or sport = :8011 or sport = :8012 or sport = :4000 or sport = :4100 or sport = :5432'

Check the ingestion timer:

systemctl list-timers localai-rag-ingest.timer --no-pager