Skip to content

Engineering Local AI Platform Architecture

Last updated: 2026-06-08

Purpose

This page turns the deep local AI architecture research into a product architecture RapidDraft can explain to engineering customers. It connects:

  • the current Fedora local AI server implementation,
  • the Theegarten-Pactec demo and follow-up page,
  • and the repeatable target architecture for customer-controlled deployments.

The main conclusion from the research is that RapidDraft should not be positioned as a one-off "LLM box" or generic RAG chatbot. RapidDraft should be positioned as a local engineering AI platform: RapidDraft owns the engineering workflow, evidence model, agent behavior, audit trail, and human approval path; the inference layer can run on customer-approved private infrastructure.

Architecture Decision

Separate the engineering control plane from the inference plane.

Plane Owns Why it matters
Engineering control plane Review workflows, artifacts, prompts, agent templates, approvals, audit events, reports Keeps RapidDraft behavior stable across customers and hardware choices
Knowledge plane Indexed documents, chunks, embeddings, metadata, citations, document versions Makes answers evidence-linked instead of free-form chat
Inference plane Text model, vision model, embedding model, reranker/OCR models, GPU runtime Lets customers choose DGX Spark, GPU servers, NVIDIA NIM, vLLM, or enterprise AI platforms
Connector plane CAD, drawing, BOM, PLM/PDM, EPLAN, SharePoint/file-share, ERP/MES connectors Keeps customer-specific integration outside the core product
Governance plane SSO, RBAC, project ACLs, prompt/workflow versions, telemetry policy, secrets, backups, signed updates Makes the deployment acceptable to industrial IT/security teams

The product boundary should stay stable even when the inference stack changes. A pilot can use the current Fedora local stack or a DGX Spark-style appliance. A production customer may later choose NVIDIA AI Enterprise, OpenShift AI, VMware Private AI Foundation, Dell AI Factory, HPE Private Cloud AI, or an open vLLM/Qdrant/Postgres stack. RapidDraft's customer-visible workflow should not be rewritten for each infrastructure choice.

Current Implemented Reference Stack

The current local AI server is a working reference implementation, not the final enterprise architecture. It proves the core pattern:

flowchart LR
  RD["RapidDraft backend / Agent tools"]
  GW["LiteLLM gateway"]
  TXT["Local text model"]
  VIS["Local vision model"]
  EMB["Local embedding model"]
  RAG["Knowledge / RAG API"]
  DB["PostgreSQL + pgvector"]
  DOC["File-drop ingestion + OCR/extraction"]

  RD --> RAG
  RD --> GW
  RAG --> DB
  RAG --> GW
  GW --> TXT
  GW --> VIS
  GW --> EMB
  DOC --> DB

Implemented reference components:

  • local model backends for text, vision, and embeddings,
  • LiteLLM as the model gateway,
  • Knowledge/RAG API with search, answer, chat, inventory, and citations,
  • PostgreSQL + pgvector for document/chunk storage,
  • file-drop ingestion with extraction/OCR paths,
  • backend-mediated access for RapidDraft product code,
  • raw model services kept behind the local server boundary.

Use this stack as the internal proof point and fast pilot path. Do not present every internal port, host, key path, or service name on customer-facing pages.

Target Product Architecture

The research recommends this layered structure for a customer-ready local RapidDraft deployment:

flowchart TB
  subgraph Customer["Customer-controlled environment"]
    IdP["Customer IdP / SSO<br/>Keycloak, Authentik, AD, or customer IdP"]
    RDUI["RapidDraft web UI"]
    API["RapidDraft API + engineering control plane"]
    WF["Workflow / agent orchestration<br/>LangGraph-ready, Temporal-ready"]
    K["Knowledge plane<br/>Postgres/pgvector, Qdrant, object store"]
    INF["Inference plane<br/>NIM / vLLM / llama.cpp fallback"]
    CON["Connector plane<br/>CAD, PLM, EPLAN, file shares"]
    AUD["Audit + governance<br/>versions, approvals, logs, backups"]
  end

  IdP --> RDUI
  RDUI --> API
  API --> WF
  WF --> K
  WF --> INF
  WF --> CON
  API --> AUD
  K --> AUD
  CON --> AUD

Recommended pilot stack:

  • Docker Compose or a simple VM/appliance deployment for the first controlled pilot.
  • RapidDraft web/backend plus Knowledge/RAG.
  • PostgreSQL + pgvector for a small pilot; Qdrant added when retrieval scale or multimodal search requires it.
  • LiteLLM or a RapidDraft model gateway in front of model-serving runtimes.
  • NVIDIA NIM where the customer has NVIDIA AI Enterprise or wants vendor-supported inference.
  • vLLM as the broad open serving fallback.
  • Human approval and audit logging as mandatory behavior.

Recommended production stack:

  • Helm/Kubernetes or a customer-approved enterprise AI platform when IT requires it.
  • SSO/RBAC integrated with customer identity.
  • Separate transactional metadata, vector retrieval, and object storage.
  • Signed release packages, SBOMs, vulnerability scanning, backup/restore, and offline update paths.
  • Versioned prompt packs, agent templates, workflow templates, model routing, and retrieval dataset snapshots.

Hardware Positioning

The research disagrees on how far DGX Spark can stretch, so the safe product position is:

Profile Use Customer message
DGX Spark or compact appliance Internal validation, executive demo, isolated workcell, small pilot Good reference appliance and pilot profile; do not promise department-scale concurrency from a single box
RTX PRO / L40S style GPU server First serious shared customer pilot or small department Stronger shared inference profile when multiple engineers use the system
H100/H200 or customer enterprise AI platform Larger enterprise rollout, heavier multimodal workloads, higher concurrency Use when the customer already knows they need central shared capacity
Existing customer AI platform Conservative IT, standardized private AI infrastructure RapidDraft should run on approved infrastructure instead of forcing a bespoke box

The page-level message should be: RapidDraft can run locally or privately; final sizing depends on the customer deployment boundary, workload, model mix, and concurrency.

Theegarten-Pactec Pilot Interpretation

For Theegarten, the right public explanation is:

  1. RapidDraft receives a selected release package: drawing, BOM, metadata, and optional EPLAN/CIM Database context.
  2. RapidDraft runs deterministic checks first where structured evidence exists.
  3. The Knowledge plane retrieves project documents and returns cited answers.
  4. The inference plane supports drafting, summarization, visual interpretation, and language assistance.
  5. The engineer reviews every finding and release summary before any PLM action.
  6. CIM Database remains the system of record.

This avoids three traps:

  • sounding like a public AI chatbot,
  • implying automatic PLM release/write-back,
  • or promising that Theegarten data is already in a production customer-controlled system before the pilot boundary is agreed.

What The Public Theegarten Page Can Safely Say

Safe public claim Internal detail Avoid saying
RapidDraft supports a controlled local/private AI route Current reference stack uses local inference, LiteLLM, Knowledge/RAG, Postgres + pgvector "Theegarten data is already fully on-prem in production"
The Agent calls backend tools, not browser model endpoints Browser-to-model calls are not the product path endpoint names, raw ports, tokens, hostnames
Findings are evidence-linked Sources may be drawing zones, BOM rows, DFM rules, or cited documents "The AI sees everything"
Engineer approval stays mandatory Human-in-the-loop is a product and liability boundary "fully automatic release"
No silent training Project indexing can be retrieval-only inside a boundary "we train on your drawings"
DGX Spark is a pilot/reference appliance Larger shared teams may need GPU servers or customer AI platforms "DGX Spark is the default department production server"

Design Implications For Theegarten Page

The page should show one professional architecture visual with these layers:

flowchart LR
  A["Theegarten release package<br/>Drawing, BOM, metadata, EPLAN/CIM context"] -->
  B["RapidDraft review workspace<br/>deterministic checks + agent tools"] -->
  C["Controlled AI / Knowledge runtime<br/>private model route + cited retrieval"] -->
  D["Evidence-linked findings<br/>source, action, confidence, owner"] -->
  E["Engineer approval<br/>CIM Database remains authoritative"]

Visual language:

  • engineering workflow first, infrastructure second,
  • visible customer/private boundary,
  • clear human approval gate,
  • no secret endpoints or internal server names,
  • no vendor logo pile,
  • avoid a generic "AI cloud" diagram.

Roadmap Boundaries

Near-term pilot:

  • one selected release workflow,
  • one sample package,
  • a small rule/evidence checklist,
  • controlled local/private architecture review,
  • cited findings and review report,
  • human approval.

Later enterprise:

  • deeper PLM/PDM connector governance,
  • wider EPLAN/CIM Database sync,
  • Qdrant/object-store scale-out,
  • Temporal-backed durable workflows,
  • customer SSO/RBAC hardening,
  • signed offline updates,
  • enterprise GPU scheduling.

Do not sell later enterprise capabilities as if they are required for the first Theegarten follow-up page.

Sources

  • Local research report: /Users/adeelyj/code/local ai server setup/architecture research reports/Local AI architecture chathgpt.md
  • Local research report: /Users/adeelyj/code/local ai server setup/architecture research reports/Local AI architecture perplexity.md
  • Current reference implementation: /Users/adeelyj/code/local ai server setup/local-ai-stack-repo/docs/local-ai-wiki/overview-architecture.md
  • Current reference implementation: /Users/adeelyj/code/local ai server setup/local-ai-stack-repo/docs/local-ai-wiki/service-inventory-port-map.md
  • Current reference implementation: /Users/adeelyj/code/local ai server setup/local-ai-stack-repo/docs/local-ai-wiki/rag-ingestion-search-guide.md
  • Current validation notes: /Users/adeelyj/code/local ai server setup/local-ai-stack-repo/docs/validation/260603_eplan_rag_knowledge_upgrade/README.md
  • Theegarten pilot architecture page: ../../docs_theegarten/local-ai-agent-flow.md

Open Questions

  1. Should Theegarten's preferred pilot boundary be on-prem hardware, dedicated private EU hosting, or customer-approved enterprise AI infrastructure?
  2. Which sample Theegarten release package can be used to validate retrieval, drawing/BOM checks, and EPLAN context?
  3. Should the public customer page mention DGX Spark visually, or keep the visual abstract to avoid anchoring on hardware too early?
  4. Which enterprise architecture tier should RapidDraft prepare first: Compose pilot, single-node appliance, or Helm/Kubernetes?