RAG Ingestion and Search Guide¶

Last updated: 2026-05-28

Purpose¶

The local RAG system indexes engineering and manufacturing documents into PostgreSQL + pgvector, then uses local model routes for search and cited answers.

The current RAG API and chat UI run at:

http://127.0.0.1:4100

The service is local-only. Bearer authentication is now enabled on the live server for protected routes, but the bind address is still localhost-only by design. Treat browser access on Fedora and SSH tunnel access from a trusted client as the supported paths for now.

Document Drop Workflow¶

Document root:

/srv/localai/documents

Subfolders:

/srv/localai/documents/inbox       new files to ingest
/srv/localai/documents/processing  temporary ingestion state
/srv/localai/documents/archive     successfully indexed source files
/srv/localai/documents/failed      failed source files

Basic workflow:

Copy files or nested folder trees into /srv/localai/documents/inbox.
Wait for localai-rag-ingest.timer, or trigger ingestion manually.
The worker computes a checksum and skips duplicates.
Text is extracted from the source file.
Text is chunked.
Chunks are embedded through local/embed-engineering.
Metadata, text, and vectors are stored in PostgreSQL.
The original file moves to archive/ or failed/.

The worker now walks the inbox recursively. Relative folder structure is preserved when files move through processing/, archive/, and failed/.

Ignored paths:

hidden files and folders such as .DS_Store
__MACOSX

Copy a file into the inbox:

sudo install -o localai -g localai -m 0644 ./example.pdf /srv/localai/documents/inbox/

Copy a nested folder tree:

sudo mkdir -p /srv/localai/documents/inbox/textcad
sudo cp -R ./01_Standards /srv/localai/documents/inbox/textcad/
sudo chown -R localai:localai /srv/localai/documents/inbox/textcad

Trigger ingestion immediately:

curl -fsS -X POST http://127.0.0.1:4100/documents/ingest \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"

Run the ingestion worker directly:

sudo systemctl start localai-rag-ingest.service

Watch ingestion logs:

journalctl -u localai-rag-ingest.service -f

Supported File Types¶

The current ingestion worker supports:

Type	Extensions	Extraction path
PDF	`.pdf`	`pypdf` text extraction
DOCX	`.docx`	`python-docx` paragraph extraction
CSV	`.csv`	row text extraction
Text	`.txt`	UTF-8 text read with ignored errors
Markdown	`.md`, `.markdown`	UTF-8 text read with ignored errors
Images	`.png`, `.jpg`, `.jpeg`, `.webp`, `.tif`, `.tiff`	Tesseract OCR
Legacy documents	`.doc`, `.odt`, `.rtf`	`pandoc -t plain` when extractable

Image ingestion stores OCR text with source_kind set to ocr.

Chunking¶

Defaults:

LOCALAI_RAG_CHUNK_WORDS=750
LOCALAI_RAG_CHUNK_OVERLAP=120

If these variables are not set, the worker uses those defaults internally.

Chunks are stored in rag_chunks with:

document ID
chunk index
page number when available
section title field
source kind
content text
generated full-text search vector
token count approximation
metadata JSONB
1024-dimensional embedding vector

Ingestion States¶

Document-level states and stages include:

queued
processing
extracting
chunking
embedding
indexed
failed

Progress fields include:

chunk_count
embedded_chunk_count
failed_chunk_count
ingest_progress
last_error
ingest_started_at
ingest_finished_at

Status Endpoints¶

Get ingestion status:

curl -fsS http://127.0.0.1:4100/ingestion/status \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"

Stream ingestion status events:

curl -N http://127.0.0.1:4100/ingestion/events \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"

List indexed documents:

curl -fsS http://127.0.0.1:4100/documents \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"

On the live server, the first real corpus pass on 2026-05-28 produced:

9 indexed standards PDFs under textcad/01_Standards/...
21 indexed CVAT job 75 images under cvat/job_75/...
1 failed standards PDF: textcad/01_Standards/DS ISO 1101.pdf

Database Schema Shape¶

Required extensions:

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;

Primary tables:

rag_documents
rag_chunks

Important indexes:

rag_documents_status_idx
rag_documents_metadata_gin
rag_chunks_document_idx
rag_chunks_embedding_hnsw
rag_chunks_content_tsv_gin
rag_chunks_content_trgm_gin
rag_chunks_metadata_gin

The embedding index uses HNSW over vector_cosine_ops.

Search Modes¶

The RAG API supports these search modes:

Mode	Behavior
`hybrid`	combines semantic vector search and PostgreSQL full-text search
`semantic`	embeds the query and searches by vector distance
`keyword`	uses PostgreSQL full-text search with `websearch_to_tsquery`
`boolean`	same implementation path as `keyword`
`phrase`	uses case-insensitive substring matching plus trigram similarity score
`regex`	uses PostgreSQL case-insensitive regex matching
`metadata`	filters by metadata/document fields without query text

Similar chunk search is available through /search/similar.

Neighbor expansion can include adjacent chunks around matches. It is enabled by default for /search requests unless expand_neighbors is set to false.

Supported filter keys:

{
  "document_id": "uuid",
  "source_kind": "text",
  "metadata": {
    "key": "value"
  }
}

metadata uses JSONB containment.

Search Examples¶

Hybrid search:

curl -fsS http://127.0.0.1:4100/search \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "fixture clamp torque",
    "mode": "hybrid",
    "limit": 8
  }'

Phrase search:

curl -fsS http://127.0.0.1:4100/search \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "fixture clamp torque",
    "mode": "phrase",
    "limit": 5,
    "expand_neighbors": false
  }'

Metadata search:

curl -fsS http://127.0.0.1:4100/search \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
  -H 'Content-Type: application/json' \
  -d '{
    "mode": "metadata",
    "limit": 10,
    "filters": {
      "source_kind": "ocr"
    }
  }'

Similar chunk search:

curl -fsS http://127.0.0.1:4100/search/similar \
  -H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
  -H 'Content-Type: application/json' \
  -d '{
    "chunk_id": "00000000-0000-0000-0000-000000000000",
    "limit": 8
  }'

Replace the placeholder UUID with a real chunk ID from a search result.

Frontend Check On The Fedora Machine¶

To test the indexed corpus through the local browser UI:

Open http://127.0.0.1:4100.
Paste the saved RAG key into the RAG API key field.
Click Use Key.
Ask a targeted question about the ingested standards or images.

If a broad summarization question fails, narrow the question to a specific standard, topic, or folder. The server now trims retrieved context, but specific prompts still perform better than corpus-wide open-ended prompts.

Sources¶

/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/rag/app/ingest.py
/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/rag/app/app.py
/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/HANDOFF.md