RAG Ingestion and Search Guide¶
Last updated: 2026-05-28
Purpose¶
The local RAG system indexes engineering and manufacturing documents into PostgreSQL + pgvector, then uses local model routes for search and cited answers.
The current RAG API and chat UI run at:
http://127.0.0.1:4100
The service is local-only. Bearer authentication is now enabled on the live server for protected routes, but the bind address is still localhost-only by design. Treat browser access on Fedora and SSH tunnel access from a trusted client as the supported paths for now.
Document Drop Workflow¶
Document root:
/srv/localai/documents
Subfolders:
/srv/localai/documents/inbox new files to ingest
/srv/localai/documents/processing temporary ingestion state
/srv/localai/documents/archive successfully indexed source files
/srv/localai/documents/failed failed source files
Basic workflow:
- Copy files or nested folder trees into
/srv/localai/documents/inbox. - Wait for
localai-rag-ingest.timer, or trigger ingestion manually. - The worker computes a checksum and skips duplicates.
- Text is extracted from the source file.
- Text is chunked.
- Chunks are embedded through
local/embed-engineering. - Metadata, text, and vectors are stored in PostgreSQL.
- The original file moves to
archive/orfailed/.
The worker now walks the inbox recursively. Relative folder structure is preserved when files move through processing/, archive/, and failed/.
Ignored paths:
- hidden files and folders such as
.DS_Store __MACOSX
Copy a file into the inbox:
sudo install -o localai -g localai -m 0644 ./example.pdf /srv/localai/documents/inbox/
Copy a nested folder tree:
sudo mkdir -p /srv/localai/documents/inbox/textcad
sudo cp -R ./01_Standards /srv/localai/documents/inbox/textcad/
sudo chown -R localai:localai /srv/localai/documents/inbox/textcad
Trigger ingestion immediately:
curl -fsS -X POST http://127.0.0.1:4100/documents/ingest \
-H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"
Run the ingestion worker directly:
sudo systemctl start localai-rag-ingest.service
Watch ingestion logs:
journalctl -u localai-rag-ingest.service -f
Supported File Types¶
The current ingestion worker supports:
| Type | Extensions | Extraction path |
|---|---|---|
.pdf |
pypdf text extraction |
|
| DOCX | .docx |
python-docx paragraph extraction |
| CSV | .csv |
row text extraction |
| Text | .txt |
UTF-8 text read with ignored errors |
| Markdown | .md, .markdown |
UTF-8 text read with ignored errors |
| Images | .png, .jpg, .jpeg, .webp, .tif, .tiff |
Tesseract OCR |
| Legacy documents | .doc, .odt, .rtf |
pandoc -t plain when extractable |
Image ingestion stores OCR text with source_kind set to ocr.
Chunking¶
Defaults:
LOCALAI_RAG_CHUNK_WORDS=750
LOCALAI_RAG_CHUNK_OVERLAP=120
If these variables are not set, the worker uses those defaults internally.
Chunks are stored in rag_chunks with:
- document ID
- chunk index
- page number when available
- section title field
- source kind
- content text
- generated full-text search vector
- token count approximation
- metadata JSONB
- 1024-dimensional embedding vector
Ingestion States¶
Document-level states and stages include:
queued
processing
extracting
chunking
embedding
indexed
failed
Progress fields include:
chunk_countembedded_chunk_countfailed_chunk_countingest_progresslast_erroringest_started_atingest_finished_at
Status Endpoints¶
Get ingestion status:
curl -fsS http://127.0.0.1:4100/ingestion/status \
-H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"
Stream ingestion status events:
curl -N http://127.0.0.1:4100/ingestion/events \
-H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"
List indexed documents:
curl -fsS http://127.0.0.1:4100/documents \
-H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}"
On the live server, the first real corpus pass on 2026-05-28 produced:
9indexed standards PDFs undertextcad/01_Standards/...21indexed CVAT job 75 images undercvat/job_75/...1failed standards PDF:textcad/01_Standards/DS ISO 1101.pdf
Database Schema Shape¶
Required extensions:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
Primary tables:
rag_documents
rag_chunks
Important indexes:
rag_documents_status_idx
rag_documents_metadata_gin
rag_chunks_document_idx
rag_chunks_embedding_hnsw
rag_chunks_content_tsv_gin
rag_chunks_content_trgm_gin
rag_chunks_metadata_gin
The embedding index uses HNSW over vector_cosine_ops.
Search Modes¶
The RAG API supports these search modes:
| Mode | Behavior |
|---|---|
hybrid |
combines semantic vector search and PostgreSQL full-text search |
semantic |
embeds the query and searches by vector distance |
keyword |
uses PostgreSQL full-text search with websearch_to_tsquery |
boolean |
same implementation path as keyword |
phrase |
uses case-insensitive substring matching plus trigram similarity score |
regex |
uses PostgreSQL case-insensitive regex matching |
metadata |
filters by metadata/document fields without query text |
Similar chunk search is available through /search/similar.
Neighbor expansion can include adjacent chunks around matches. It is enabled by default for /search requests unless expand_neighbors is set to false.
Search Filters¶
Supported filter keys:
{
"document_id": "uuid",
"source_kind": "text",
"metadata": {
"key": "value"
}
}
metadata uses JSONB containment.
Search Examples¶
Hybrid search:
curl -fsS http://127.0.0.1:4100/search \
-H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
-H 'Content-Type: application/json' \
-d '{
"query": "fixture clamp torque",
"mode": "hybrid",
"limit": 8
}'
Phrase search:
curl -fsS http://127.0.0.1:4100/search \
-H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
-H 'Content-Type: application/json' \
-d '{
"query": "fixture clamp torque",
"mode": "phrase",
"limit": 5,
"expand_neighbors": false
}'
Metadata search:
curl -fsS http://127.0.0.1:4100/search \
-H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
-H 'Content-Type: application/json' \
-d '{
"mode": "metadata",
"limit": 10,
"filters": {
"source_kind": "ocr"
}
}'
Similar chunk search:
curl -fsS http://127.0.0.1:4100/search/similar \
-H "Authorization: Bearer ${LOCALAI_RAG_API_KEY}" \
-H 'Content-Type: application/json' \
-d '{
"chunk_id": "00000000-0000-0000-0000-000000000000",
"limit": 8
}'
Replace the placeholder UUID with a real chunk ID from a search result.
Frontend Check On The Fedora Machine¶
To test the indexed corpus through the local browser UI:
- Open
http://127.0.0.1:4100. - Paste the saved RAG key into the
RAG API keyfield. - Click
Use Key. - Ask a targeted question about the ingested standards or images.
If a broad summarization question fails, narrow the question to a specific standard, topic, or folder. The server now trims retrieved context, but specific prompts still perform better than corpus-wide open-ended prompts.
Sources¶
/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/rag/app/ingest.py/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/rag/app/app.py/Users/adeelyj/code/local ai server setup/local-ai-stack-repo/HANDOFF.md