Local RAG Guide: CAE Knowledge Base¶

Source files: Architechture & Research/Infrastructure Research/Knowledge Base & RAG/Local RAG Guide.md Last synthesized: March 2026

Overview¶

This guide describes how to set up a local Retrieval-Augmented Generation (RAG) system over a 14,000-file CAE toolbox. The system runs entirely on-premise (no cloud), searches 14,200 text-searchable technical documents, and uses a local LLM to answer engineering questions.

Goal: Enable RapidDraft and Autonomous CAE to reference past solutions, standards, and automation scripts without cloud API calls or data exposure.

Your Collection at a Glance¶

Category	File Count	Notes
PDFs	4,191	Handbooks, standards, tutorials — highest value
PowerPoint (PPT/PPTX)	708	Training presentations, procedure docs
HTML/HTM	5,166	NX/ANSYS training web pages, documentation
Scripts (.py, .tcl, .sh, .inp)	1,618	Automation scripts, solver inputs, CAM macros
Text/CSV/TXT	1,565	Notes, data, logs, parameters
CAE Binary (.prt, .fem, .h3d, .sim)	6,243	Not text-searchable — skip
Images (.gif, .png, .jpg)	17,045	Not searchable — skip
Other (.js, .css, .otf, fonts)	~18,000	Web assets from training — skip
Total text-searchable	~14,200	This is what gets indexed

Architecture¶

┌─────────────────────────────────────────────────────┐
│                  Your Windows Machine                │
│                                                     │
│  ┌──────────────┐    ┌──────────────┐   ┌────────┐ │
│  │  113_CAE_    │    │  File Watcher│   │ ChromaDB
│  │  Toolbox     │───>│  Service     │──>│ (vectors)
│  │  (OneDrive)  │    │  (watchdog)  │   │ (local) │
│  └──────────────┘    └──────────────┘   └───┬────┘ │
│                                              │      │
│  ┌──────────────┐    ┌──────────────┐     │      │
│  │  LM Studio   │<───│  Chat UI     │<────┘      │
│  │  (Qwen 32B)  │    │  (Gradio)    │            │
│  │  port 1234   │    │  port 7860   │            │
│  └──────────────┘    └──────────────┘            │
│                                                     │
└─────────────────────────────────────────────────────┘

How It Works¶

File Watcher runs as background service, monitors CAE Toolbox folder
New/changed files get parsed, chunked, embedded, and stored in ChromaDB
Already-indexed files tracked by hash — no re-processing
Chat UI takes your question, retrieves relevant chunks from ChromaDB
LM Studio (Qwen 32B) generates answer from question + retrieved context

Setup (Windows — Step by Step)¶

Prerequisites¶

Python 3.11+ (you have this)
LM Studio running with Qwen 32B or similar (you have this; listens on port 1234)
~4GB RAM for embeddings + vector DB (you have 128GB, so no issue)

Step 1: Create Project Directory¶

Open PowerShell:

mkdir C:\Users\adeel\cae_rag_service
cd C:\Users\adeel\cae_rag_service
python -m venv venv
.\venv\Scripts\Activate.ps1

Step 2: Install Dependencies¶

pip install chromadb sentence-transformers watchdog pymupdf python-pptx
pip install beautifulsoup4 gradio openai chardet tiktoken
pip install python-docx openpyxl lxml

Why each package: - chromadb — Local vector database, persistent storage, no server needed - sentence-transformers — Embedding model (runs locally, no API calls) - watchdog — Filesystem monitoring (detects new files) - pymupdf (fitz) — Fastest PDF parser; handles scanned PDFs - python-pptx — PowerPoint text extraction - beautifulsoup4 — HTML parsing for training pages - python-docx — Word document extraction - chardet — Encoding detection for old text files

Step 3: Configuration¶

Create config.py:

"""Configuration for CAE Toolbox RAG Service."""
import os

# === PATHS ===
TOOLBOX_ROOT = r"C:\Users\adeel\OneDrive\100_Knowledge\113_CAE_Toolbox"
CHROMA_DB_PATH = r"C:\Users\adeel\cae_rag_service\chroma_db"
INDEX_STATE_PATH = r"C:\Users\adeel\cae_rag_service\index_state.json"
LOG_PATH = r"C:\Users\adeel\cae_rag_service\logs"

# === EMBEDDING MODEL ===
# Runs locally; ~400MB download first time
EMBEDDING_MODEL = "BAAI/bge-base-en-v1.5"

# === LM STUDIO ===
LM_STUDIO_URL = "http://localhost:1234/v1"
LM_STUDIO_MODEL = "qwen2.5-coder-32b"

# === CHUNKING ===
CHUNK_SIZE = 800        # tokens per chunk
CHUNK_OVERLAP = 100     # overlap between chunks
MAX_FILE_SIZE_MB = 100  # skip larger files

# === RETRIEVAL ===
TOP_K = 8               # chunks to retrieve per query
SIMILARITY_THRESHOLD = 0.25  # minimum relevance (0-1)

# === FILE TYPES TO INDEX ===
TEXT_EXTENSIONS = {
    # Documents
    '.pdf', '.doc', '.docx', '.ppt', '.pptx', '.rtf',
    '.xls', '.xlsx',
    # Web content
    '.html', '.htm',
    # Plain text
    '.txt', '.md', '.csv', '.tsv', '.log', '.out',
    # Code & scripts
    '.py', '.tcl', '.sh', '.bat', '.m', '.cfg',
    # Solver inputs
    '.inp', '.dat', '.k', '.key', '.bdf', '.nas', '.fem',
    # Other
    '.xml', '.json', '.tex',
}

# Skip these
SKIP_EXTENSIONS = {
    '.gif', '.png', '.jpg', '.jpeg', '.bmp', '.tif', '.tiff', '.svg',
    '.avi', '.mp4', '.wmv', '.mov',
    '.js', '.css', '.otf', '.woff', '.eot', '.ttf',
    '.pyc', '.pyo', '.class',
    '.zip', '.rar', '.7z', '.gz', '.tar',
    '.exe', '.dll', '.so', '.msi',
    '.h3d', '.op2', '.sim', '.prt', '.stp', '.step', '.iges',
    '.stl', '.d3plot', '.binout', '.rst', '.rth', '.db', '.cdb',
    '.catpart', '.catproduct', '.sldprt', '.sldasm',
    '.x_t', '.x_b', '.jt', '.3dm', '.dwg', '.dxf',
    '.stat', '.mvw', '.hm', '.rad',
}

SKIP_FOLDERS = {
    '__pycache__', '.git', 'node_modules', '.ipynb_checkpoints',
}

os.makedirs(CHROMA_DB_PATH, exist_ok=True)
os.makedirs(LOG_PATH, exist_ok=True)

Step 4: File Parser Module¶

Create parsers.py:

"""Document parsers for different file types."""
import os, chardet, logging

logger = logging.getLogger("cae_rag")

def parse_pdf(filepath: str) -> str:
    """Extract text from PDF using PyMuPDF."""
    import fitz
    try:
        doc = fitz.open(filepath)
        texts = [page.get_text() for page in doc]
        doc.close()
        return "\n".join(texts)
    except Exception as e:
        logger.warning(f"PDF parse failed: {filepath} — {e}")
        return ""

def parse_pptx(filepath: str) -> str:
    """Extract text from PowerPoint."""
    from pptx import Presentation
    try:
        prs = Presentation(filepath)
        texts = []
        for slide_num, slide in enumerate(prs.slides, 1):
            slide_text = []
            for shape in slide.shapes:
                if shape.has_text_frame:
                    for para in shape.text_frame.paragraphs:
                        text = para.text.strip()
                        if text:
                            slide_text.append(text)
            if slide_text:
                texts.append(f"[Slide {slide_num}]\n" + "\n".join(slide_text))
        return "\n\n".join(texts)
    except Exception as e:
        logger.warning(f"PPTX parse failed: {filepath} — {e}")
        return ""

def parse_docx(filepath: str) -> str:
    """Extract text from Word documents."""
    from docx import Document
    try:
        doc = Document(filepath)
        return "\n".join(para.text for para in doc.paragraphs if para.text.strip())
    except Exception as e:
        logger.warning(f"DOCX parse failed: {filepath} — {e}")
        return ""

def parse_html(filepath: str) -> str:
    """Extract text from HTML files."""
    from bs4 import BeautifulSoup
    try:
        raw = open(filepath, 'rb').read()
        encoding = chardet.detect(raw[:10000])['encoding'] or 'utf-8'
        html = raw.decode(encoding, errors='replace')
        soup = BeautifulSoup(html, 'lxml')
        for tag in soup(['script', 'style', 'nav', 'footer']):
            tag.decompose()
        text = soup.get_text(separator='\n', strip=True)
        return text
    except Exception as e:
        logger.warning(f"HTML parse failed: {filepath} — {e}")
        return ""

def parse_text(filepath: str) -> str:
    """Read plain text files with encoding detection."""
    try:
        raw = open(filepath, 'rb').read()
        if not raw:
            return ""
        encoding = chardet.detect(raw[:10000])['encoding'] or 'utf-8'
        return raw.decode(encoding, errors='replace')
    except Exception as e:
        logger.warning(f"Text parse failed: {filepath} — {e}")
        return ""

Step 5: RAG Service¶

Create rag_service.py:

"""Local RAG service: index documents + retrieve + generate answers."""
import chromadb, hashlib, json, logging
from sentence_transformers import SentenceTransformer
from pathlib import Path
from config import *
from parsers import parse_pdf, parse_pptx, parse_docx, parse_html, parse_text
import tiktoken

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("cae_rag")

# Initialize ChromaDB and embedding model
client = chromadb.PersistentClient(path=CHROMA_DB_PATH)
collection = client.get_or_create_collection(name="cae_toolbox")
embedding_model = SentenceTransformer(EMBEDDING_MODEL)
encoder = tiktoken.get_encoding("cl100k_base")

def get_file_hash(filepath: str) -> str:
    """Compute SHA256 hash of file."""
    sha256 = hashlib.sha256()
    with open(filepath, 'rb') as f:
        sha256.update(f.read())
    return sha256.hexdigest()

def index_document(filepath: str, text: str):
    """Chunk, embed, and store document."""
    if not text.strip():
        logger.info(f"Skipping empty document: {filepath}")
        return

    # Chunk by token count
    tokens = encoder.encode(text)
    chunks = []
    for i in range(0, len(tokens), CHUNK_SIZE - CHUNK_OVERLAP):
        chunk_tokens = tokens[i:i + CHUNK_SIZE]
        chunk_text = encoder.decode(chunk_tokens)
        chunks.append(chunk_text)

    # Embed and store chunks
    for idx, chunk in enumerate(chunks):
        try:
            embedding = embedding_model.encode(chunk)
            collection.add(
                ids=[f"{filepath}_{idx}"],
                embeddings=[embedding],
                metadatas=[{"source": filepath, "chunk": idx}],
                documents=[chunk]
            )
        except Exception as e:
            logger.error(f"Failed to embed chunk: {e}")

    logger.info(f"Indexed {len(chunks)} chunks from {filepath}")

def parse_file(filepath: str) -> str:
    """Parse file based on extension."""
    ext = Path(filepath).suffix.lower()
    if ext == '.pdf':
        return parse_pdf(filepath)
    elif ext in ['.pptx', '.ppt']:
        return parse_pptx(filepath)
    elif ext in ['.docx', '.doc']:
        return parse_docx(filepath)
    elif ext in ['.html', '.htm']:
        return parse_html(filepath)
    else:
        return parse_text(filepath)

def scan_and_index():
    """Scan toolbox, index new files."""
    indexed = 0
    for root, dirs, files in os.walk(TOOLBOX_ROOT):
        # Skip unwanted folders
        dirs[:] = [d for d in dirs if d not in SKIP_FOLDERS]

        for filename in files:
            ext = Path(filename).suffix.lower()
            if ext not in TEXT_EXTENSIONS or ext in SKIP_EXTENSIONS:
                continue

            filepath = os.path.join(root, filename)
            size_mb = os.path.getsize(filepath) / (1024 * 1024)
            if size_mb > MAX_FILE_SIZE_MB:
                logger.warning(f"Skipping large file: {filepath} ({size_mb:.1f} MB)")
                continue

            try:
                text = parse_file(filepath)
                if text.strip():
                    index_document(filepath, text)
                    indexed += 1
            except Exception as e:
                logger.error(f"Error processing {filepath}: {e}")

    logger.info(f"Total documents indexed: {indexed}")

def retrieve(query: str, top_k: int = TOP_K) -> list:
    """Retrieve relevant chunks for a query."""
    query_embedding = embedding_model.encode(query)
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=top_k,
        where={"distance": {"$lte": 1 - SIMILARITY_THRESHOLD}}
    )

    documents = []
    if results and results['documents']:
        for doc, metadata, distance in zip(
            results['documents'][0],
            results['metadatas'][0],
            results['distances'][0]
        ):
            documents.append({
                'text': doc,
                'source': metadata['source'],
                'relevance': 1 - distance
            })

    return documents

def generate_answer(query: str) -> str:
    """Retrieve context + generate answer using LM Studio."""
    import requests

    # Retrieve context
    context_docs = retrieve(query)
    context = "\n\n".join([f"Source: {d['source']}\n{d['text']}" for d in context_docs])

    # Generate answer
    prompt = f"""You are a CAE engineering expert. Answer the question using the provided context.

Question: {query}

Context:
{context}

Answer:"""

    try:
        response = requests.post(
            f"{LM_STUDIO_URL}/chat/completions",
            json={
                "model": LM_STUDIO_MODEL,
                "messages": [{"role": "user", "content": prompt}],
                "temperature": 0.7,
                "max_tokens": 500
            },
            timeout=60
        )
        response.raise_for_status()
        return response.json()['choices'][0]['message']['content']
    except Exception as e:
        logger.error(f"LM Studio error: {e}")
        return f"Error generating answer: {e}"

if __name__ == "__main__":
    logger.info("Starting CAE RAG indexing...")
    scan_and_index()
    logger.info("Indexing complete.")

Step 6: Chat UI (Gradio)¶

Create chat_ui.py:

"""Gradio-based chat interface."""
import gradio as gr
from rag_service import generate_answer, retrieve

def chat(message, history):
    """Chat interface."""
    answer = generate_answer(message)
    return answer

def show_sources(query):
    """Show retrieved sources for a query."""
    docs = retrieve(query, top_k=5)
    sources = "\n\n".join([
        f"**{d['source']}** (relevance: {d['relevance']:.2f})\n{d['text'][:200]}..."
        for d in docs
    ])
    return sources

# Build interface
with gr.Blocks(title="CAE Knowledge Base") as app:
    gr.Markdown("# CAE Knowledge Base RAG")

    with gr.Tabs():
        with gr.TabItem("Chat"):
            chatbot = gr.ChatInterface(chat)

        with gr.TabItem("Search Sources"):
            query_input = gr.Textbox(label="Query", placeholder="Search CAE Toolbox...")
            sources_output = gr.Markdown()
            search_btn = gr.Button("Search")
            search_btn.click(show_sources, inputs=[query_input], outputs=[sources_output])

if __name__ == "__main__":
    app.launch(server_name="127.0.0.1", server_port=7860)

Step 7: Run the Service¶

In PowerShell:

.\venv\Scripts\Activate.ps1

# Index documents (run once)
python rag_service.py

# Start chat UI
python chat_ui.py

# Open browser: http://127.0.0.1:7860

Query Examples¶

Example 1: DFM Rule Lookup¶

Query: "What is the minimum internal corner radius for CNC milling?" Expected context: Protolabs guide, internal standards, supplier docs Answer: System retrieves relevant sections and synthesizes answer

Example 2: Script Search¶

Query: "How do I automate NX CAM programming?" Expected context: NX macro examples, TCL scripts, CAM playbooks Answer: System suggests relevant automation scripts from toolbox

Example 3: Solver Parameter Question¶

Query: "What mesh size should I use for NASTRAN stress analysis?" Expected context: Analysis best practices, tutorials, solver documentation Answer: System provides guidance with examples from past analyses

Performance Characteristics¶

Metric	Value	Notes
Indexing time	~2-3 hours	First run for 14K documents
Vector DB size	~2-3 GB	Depends on chunking strategy
Query latency	<1 second	Vector search only; excludes LLM time
LLM generation time	10-30 seconds	Qwen 32B on GPU; varies by answer length
Total Q&A latency	15-40 seconds	Retrieval + generation combined

Optimization Tips¶

1. Incremental Indexing¶

Once initial index is built, only re-index changed files:

def incremental_index():
    """Only index files modified since last run."""
    index_state = load_index_state()  # JSON file tracking file hashes
    for filepath in walk_toolbox():
        current_hash = get_file_hash(filepath)
        if filepath not in index_state or index_state[filepath] != current_hash:
            parse_and_index(filepath)
            index_state[filepath] = current_hash
    save_index_state(index_state)

2. Chunk Size Tuning¶

Smaller chunks (400 tokens): Better precision, more retrieval overhead
Larger chunks (1000 tokens): Faster retrieval, more noise in context
Sweet spot: 800 tokens with 100-token overlap

3. Top-K Parameter¶

Top-K=5: Fast, lower noise
Top-K=8: Default, good balance
Top-K=15: More context, slower generation, more hallucinations possible

Maintenance¶

Update Toolbox¶

Add new files to CAE Toolbox folder; re-run scan_and_index() periodically.

Reset Index¶

# Delete old database
import shutil
shutil.rmtree(CHROMA_DB_PATH)

# Rebuild
scan_and_index()

Monitor Logs¶

Check logs/ directory for parsing errors and indexing issues.

Limitations & Future Work¶

Limitation	Workaround
No image search	Use OCR to extract text from screenshots
Table extraction	Tables become flattened text (suboptimal)
Multi-document reasoning	Retrieve multiple docs; let LLM synthesize
No real-time updates	Re-index manually when toolbox changes
Hallucinations in LLM	Use relevance threshold; ask for sources

Next Steps¶

Index your toolbox: Run scan_and_index() (2-3 hours first time)
Test retrieval: Query for simple topics to validate setup
Tune parameters: Adjust CHUNK_SIZE, TOP_K based on results
Integrate with RapidDraft: Link RAG to DFM findings for justification
Monitor performance: Log query times and answer quality

Quick Checklist¶

Python 3.11+ installed
Dependencies installed: pip install -r requirements.txt
config.py configured with correct paths
LM Studio running (port 1234) with Qwen 32B
rag_service.py executed to build initial index
chat_ui.py running (port 7860)
Tested sample queries
Documented CAE Toolbox location and structure