Skip to content

Setup and Runtime Layout

Last updated: 2026-05-28

Filesystem Layout

The local AI runtime root is:

/srv/localai

Current layout:

/srv/localai/
  .cache/
  .npm/
  config/
  documents/
    inbox/
    processing/
    archive/
    failed/
  llama.cpp/
  logs/
  models/
    qwen3-coder-next-q4_k_m/
    qwen3-embedding-0.6b-q8_0/
    qwen3.5-9b-vlm-q4_k_m/
  rag/
    app/
    static/
    templates/
  venv/

The stack runs as the localai user and group. Runtime-owned files under /srv/localai are owned by localai:localai.

Repository Copies

The server repository keeps deployable source/config copies:

config/litellm.yaml
rag/
systemd/

The deployed runtime copies are:

/etc/localai/litellm.yaml
/srv/localai/rag
/etc/systemd/system/localai-*.service
/etc/systemd/system/localai-rag-ingest.timer

Keep the repository copies updated when changing runtime behavior. The wiki should document the deployed state, not only the plan.

Runtime Config

Runtime config lives under:

/etc/localai

Files:

/etc/localai/localai.env
/etc/localai/litellm.yaml

Permissions verified:

/etc/localai              root:localai  drwxr-x---
/etc/localai/localai.env  root:localai  -rw-r-----
/etc/localai/litellm.yaml root:localai  -rw-r-----

Do not print or document secret values from these files.

Environment Variables

The environment file defines these variable names:

LITELLM_API_KEY
LITELLM_BASE_URL
LITELLM_MASTER_KEY
LITELLM_SALT_KEY
LOCALAI_CHAT_MODEL
LOCALAI_DOCUMENT_ROOT
LOCALAI_EMBED_MODEL
LOCALAI_RAG_DATABASE_URL
LOCALAI_VISION_MODEL

Expected non-secret defaults and meanings:

Variable Purpose
LITELLM_API_KEY Bearer key used by local services when calling LiteLLM
LITELLM_BASE_URL Base URL for LiteLLM, normally http://127.0.0.1:4000/v1
LITELLM_MASTER_KEY LiteLLM master key
LITELLM_SALT_KEY LiteLLM salt key
LOCALAI_CHAT_MODEL Default RAG chat model, normally local/qwen-coder
LOCALAI_DOCUMENT_ROOT Document root, normally /srv/localai/documents
LOCALAI_EMBED_MODEL Embedding model, normally local/embed-engineering
LOCALAI_RAG_DATABASE_URL PostgreSQL connection string
LOCALAI_VISION_MODEL Vision model, normally local/qwen-vision-fast

LiteLLM Config Shape

LiteLLM is configured with three model routes:

model_list:
  - model_name: local/qwen-coder
    litellm_params:
      model: openai/local/qwen-coder
      api_base: http://127.0.0.1:8010/v1
      api_key: os.environ/LITELLM_API_KEY

  - model_name: local/qwen-vision-fast
    litellm_params:
      model: openai/local/qwen-vision-fast
      api_base: http://127.0.0.1:8011/v1
      api_key: os.environ/LITELLM_API_KEY

  - model_name: local/embed-engineering
    litellm_params:
      model: openai/local/embed-engineering
      api_base: http://127.0.0.1:8012/v1
      api_key: os.environ/LITELLM_API_KEY

litellm_settings:
  drop_params: true
  request_timeout: 600
  num_retries: 1

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY

The deployed file may use LiteLLM's exact environment interpolation syntax. Do not replace environment references with literal secret values.

SELinux Contexts

SELinux persistent file contexts were added for the runtime paths:

/srv/localai/documents(/.*)?              var_lib_t
/srv/localai/llama.cpp/build/bin(/.*)?    bin_t
/srv/localai/logs(/.*)?                   var_log_t
/srv/localai/models(/.*)?                 var_lib_t
/srv/localai/rag(/.*)?                    usr_t
/srv/localai/venv/bin(/.*)?               bin_t
/srv/localai/venv/lib(/.*)?               lib_t
/srv/localai/venv/lib64(/.*)?             lib_t

If files are moved into these paths manually, restore contexts:

sudo restorecon -Rv /srv/localai /etc/localai

If a service starts manually but fails under systemd, check SELinux denials before changing service logic:

sudo journalctl -t setroubleshoot --since "30 minutes ago"
sudo ausearch -m avc -ts recent

Build and Hardware Checks

The llama.cpp build is under:

/srv/localai/llama.cpp

Useful checks:

/srv/localai/llama.cpp/build/bin/llama-cli --list-devices

Verified Vulkan device output included:

Vulkan0: AMD Radeon 8060S Graphics (RADV STRIX_HALO)

As of 2026-05-28, Fedora reported about 62 GiB system memory. llama.cpp device enumeration reported the Radeon Vulkan device with about 97495 MiB total and about 37612 MiB free at the time of the check. Treat those values as operational snapshots, not guaranteed capacity.

Database Layout

The local RAG database is:

localai_rag

Required PostgreSQL extensions:

vector
pg_trgm

Primary tables:

rag_documents
rag_chunks

The schema is stored in:

rag/schema.sql
/srv/localai/rag/schema.sql