Skip to content

RAG Ingestion and Search Guide

Last updated: 2026-05-28

Purpose

The local RAG system indexes engineering and manufacturing documents into PostgreSQL + pgvector, then uses local model routes for search and cited answers.

The current RAG API and chat UI run at:

http://127.0.0.1:4100

The service is local-only. Bearer authentication is now enabled on the live server for protected routes, but the bind address is still localhost-only by design. Treat browser access on Fedora and SSH tunnel access from a trusted client as the supported paths for now.

Document Drop Workflow

Document root:

/srv/localai/documents

Subfolders:

/srv/localai/documents/inbox       new files to ingest
/srv/localai/documents/processing  temporary ingestion state
/srv/localai/documents/archive     successfully indexed source files
/srv/localai/documents/failed      failed source files

Basic workflow:

  1. Copy files or nested folder trees into /srv/localai/documents/inbox.
  2. Wait for localai-rag-ingest.timer, or trigger ingestion manually.
  3. The worker computes a checksum and skips duplicates.
  4. Text is extracted from the source file.
  5. Text is chunked.
  6. Chunks are embedded through local/embed-engineering.
  7. Metadata, text, and vectors are stored in PostgreSQL.
  8. The original file moves to archive/ or failed/.

The worker now walks the inbox recursively. Relative folder structure is preserved when files move through processing/, archive/, and failed/.

Ignored paths:

  • hidden files and folders such as .DS_Store
  • __MACOSX

Copy a file into the inbox:

sudo install -o localai -g localai -m 0644 ./example.pdf /srv/localai/documents/inbox/

Copy a nested folder tree:

sudo mkdir -p /srv/localai/documents/inbox/textcad
sudo cp -R ./01_Standards /srv/localai/documents/inbox/textcad/
sudo chown -R localai:localai /srv/localai/documents/inbox/textcad

Trigger ingestion immediately:

curl -fsS -X POST http://127.0.0.1:4100/documents/ingest \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"

Run the ingestion worker directly:

sudo systemctl start localai-rag-ingest.service

Watch ingestion logs:

journalctl -u localai-rag-ingest.service -f

Supported File Types

The current ingestion worker supports:

Type Extensions Extraction path
PDF .pdf pypdf text extraction
DOCX .docx python-docx paragraph extraction
CSV .csv row text extraction
Text .txt UTF-8 text read with ignored errors
Markdown .md, .markdown UTF-8 text read with ignored errors
Images .png, .jpg, .jpeg, .webp, .tif, .tiff Tesseract OCR
Legacy documents .doc, .odt, .rtf pandoc -t plain when extractable

Image ingestion stores OCR text with source_kind set to ocr.

Chunking

Defaults:

LOCALAI_RAG_CHUNK_WORDS=750
LOCALAI_RAG_CHUNK_OVERLAP=120

If these variables are not set, the worker uses those defaults internally.

Chunks are stored in rag_chunks with:

  • document ID
  • chunk index
  • page number when available
  • section title field
  • source kind
  • content text
  • generated full-text search vector
  • token count approximation
  • metadata JSONB
  • 1024-dimensional embedding vector

Ingestion States

Document-level states and stages include:

queued
processing
extracting
chunking
embedding
indexed
failed

Progress fields include:

  • chunk_count
  • embedded_chunk_count
  • failed_chunk_count
  • ingest_progress
  • last_error
  • ingest_started_at
  • ingest_finished_at

Status Endpoints

Get ingestion status:

curl -fsS http://127.0.0.1:4100/ingestion/status \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"

Stream ingestion status events:

curl -N http://127.0.0.1:4100/ingestion/events \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"

List indexed documents:

curl -fsS http://127.0.0.1:4100/documents \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"

On the live server, the first real corpus pass on 2026-05-28 produced:

  • 9 indexed standards PDFs under textcad/01_Standards/...
  • 21 indexed CVAT job 75 images under cvat/job_75/...
  • 1 failed standards PDF: textcad/01_Standards/DS ISO 1101.pdf

Database Schema Shape

Required extensions:

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;

Primary tables:

rag_documents
rag_chunks

Important indexes:

rag_documents_status_idx
rag_documents_metadata_gin
rag_chunks_document_idx
rag_chunks_embedding_hnsw
rag_chunks_content_tsv_gin
rag_chunks_content_trgm_gin
rag_chunks_metadata_gin

The embedding index uses HNSW over vector_cosine_ops.

Search Modes

The RAG API supports these search modes:

Mode Behavior
hybrid combines semantic vector search and PostgreSQL full-text search
semantic embeds the query and searches by vector distance
keyword uses PostgreSQL full-text search with websearch_to_tsquery
boolean same implementation path as keyword
phrase uses case-insensitive substring matching plus trigram similarity score
regex uses PostgreSQL case-insensitive regex matching
metadata filters by metadata/document fields without query text

Similar chunk search is available through /search/similar.

Neighbor expansion can include adjacent chunks around matches. It is enabled by default for /search requests unless expand_neighbors is set to false.

Search Filters

Supported filter keys:

{
  "document_id": "uuid",
  "source_kind": "text",
  "metadata": {
    "key": "value"
  }
}

metadata uses JSONB containment.

Search Examples

Hybrid search:

curl -fsS http://127.0.0.1:4100/search \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "fixture clamp torque",
    "mode": "hybrid",
    "limit": 8
  }'

Phrase search:

curl -fsS http://127.0.0.1:4100/search \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "fixture clamp torque",
    "mode": "phrase",
    "limit": 5,
    "expand_neighbors": false
  }'

Metadata search:

curl -fsS http://127.0.0.1:4100/search \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
  -H 'Content-Type: application/json' \
  -d '{
    "mode": "metadata",
    "limit": 10,
    "filters": {
      "source_kind": "ocr"
    }
  }'

Similar chunk search:

curl -fsS http://127.0.0.1:4100/search/similar \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
  -H 'Content-Type: application/json' \
  -d '{
    "chunk_id": "00000000-0000-0000-0000-000000000000",
    "limit": 8
  }'

Replace the placeholder UUID with a real chunk ID from a search result.

Frontend Check On The Fedora Machine

To test the indexed corpus through the local browser UI:

  1. Open http://127.0.0.1:4100.
  2. Paste the saved RAG key into the RAG API key field.
  3. Click Use Key.
  4. Ask a targeted question about the ingested standards or images.

If a broad summarization question fails, narrow the question to a specific standard, topic, or folder. The server now trims retrieved context, but specific prompts still perform better than corpus-wide open-ended prompts.

Sources

  • /Users/adeelyj/code/local ai server setup/local-ai-stack-repo/rag/app/ingest.py
  • /Users/adeelyj/code/local ai server setup/local-ai-stack-repo/rag/app/app.py
  • /Users/adeelyj/code/local ai server setup/local-ai-stack-repo/HANDOFF.md