Engineering Local AI Platform Architecture¶

Last updated: 2026-06-08

Purpose¶

This page turns the deep local AI architecture research into a product architecture RapidDraft can explain to engineering customers. It connects:

the current Fedora local AI server implementation,
the Theegarten-Pactec demo and follow-up page,
and the repeatable target architecture for customer-controlled deployments.

The main conclusion from the research is that RapidDraft should not be positioned as a one-off "LLM box" or generic RAG chatbot. RapidDraft should be positioned as a local engineering AI platform: RapidDraft owns the engineering workflow, evidence model, agent behavior, audit trail, and human approval path; the inference layer can run on customer-approved private infrastructure.

Architecture Decision¶

Separate the engineering control plane from the inference plane.

Plane	Owns	Why it matters
Engineering control plane	Review workflows, artifacts, prompts, agent templates, approvals, audit events, reports	Keeps RapidDraft behavior stable across customers and hardware choices
Knowledge plane	Indexed documents, chunks, embeddings, metadata, citations, document versions	Makes answers evidence-linked instead of free-form chat
Inference plane	Text model, vision model, embedding model, reranker/OCR models, GPU runtime	Lets customers choose DGX Spark, GPU servers, NVIDIA NIM, vLLM, or enterprise AI platforms
Connector plane	CAD, drawing, BOM, PLM/PDM, EPLAN, SharePoint/file-share, ERP/MES connectors	Keeps customer-specific integration outside the core product
Governance plane	SSO, RBAC, project ACLs, prompt/workflow versions, telemetry policy, secrets, backups, signed updates	Makes the deployment acceptable to industrial IT/security teams

The product boundary should stay stable even when the inference stack changes. A pilot can use the current Fedora local stack or a DGX Spark-style appliance. A production customer may later choose NVIDIA AI Enterprise, OpenShift AI, VMware Private AI Foundation, Dell AI Factory, HPE Private Cloud AI, or an open vLLM/Qdrant/Postgres stack. RapidDraft's customer-visible workflow should not be rewritten for each infrastructure choice.

Current Implemented Reference Stack¶

The current local AI server is a working reference implementation, not the final enterprise architecture. It proves the core pattern:

flowchart LR
  RD["RapidDraft backend / Agent tools"]
  GW["LiteLLM gateway"]
  TXT["Local text model"]
  VIS["Local vision model"]
  EMB["Local embedding model"]
  RAG["Knowledge / RAG API"]
  DB["PostgreSQL + pgvector"]
  DOC["File-drop ingestion + OCR/extraction"]

  RD --> RAG
  RD --> GW
  RAG --> DB
  RAG --> GW
  GW --> TXT
  GW --> VIS
  GW --> EMB
  DOC --> DB

Implemented reference components:

local model backends for text, vision, and embeddings,
LiteLLM as the model gateway,
Knowledge/RAG API with search, answer, chat, inventory, and citations,
PostgreSQL + pgvector for document/chunk storage,
file-drop ingestion with extraction/OCR paths,
backend-mediated access for RapidDraft product code,
raw model services kept behind the local server boundary.

Use this stack as the internal proof point and fast pilot path. Do not present every internal port, host, key path, or service name on customer-facing pages.

Target Product Architecture¶

The research recommends this layered structure for a customer-ready local RapidDraft deployment:

flowchart TB
  subgraph Customer["Customer-controlled environment"]
    IdP["Customer IdP / SSO<br/>Keycloak, Authentik, AD, or customer IdP"]
    RDUI["RapidDraft web UI"]
    API["RapidDraft API + engineering control plane"]
    WF["Workflow / agent orchestration<br/>LangGraph-ready, Temporal-ready"]
    K["Knowledge plane<br/>Postgres/pgvector, Qdrant, object store"]
    INF["Inference plane<br/>NIM / vLLM / llama.cpp fallback"]
    CON["Connector plane<br/>CAD, PLM, EPLAN, file shares"]
    AUD["Audit + governance<br/>versions, approvals, logs, backups"]
  end

  IdP --> RDUI
  RDUI --> API
  API --> WF
  WF --> K
  WF --> INF
  WF --> CON
  API --> AUD
  K --> AUD
  CON --> AUD

Recommended pilot stack:

Docker Compose or a simple VM/appliance deployment for the first controlled pilot.
RapidDraft web/backend plus Knowledge/RAG.
PostgreSQL + pgvector for a small pilot; Qdrant added when retrieval scale or multimodal search requires it.
LiteLLM or a RapidDraft model gateway in front of model-serving runtimes.
NVIDIA NIM where the customer has NVIDIA AI Enterprise or wants vendor-supported inference.
vLLM as the broad open serving fallback.
Human approval and audit logging as mandatory behavior.

Recommended production stack:

Helm/Kubernetes or a customer-approved enterprise AI platform when IT requires it.
SSO/RBAC integrated with customer identity.
Separate transactional metadata, vector retrieval, and object storage.
Signed release packages, SBOMs, vulnerability scanning, backup/restore, and offline update paths.
Versioned prompt packs, agent templates, workflow templates, model routing, and retrieval dataset snapshots.

Hardware Positioning¶

The research disagrees on how far DGX Spark can stretch, so the safe product position is:

Profile	Use	Customer message
DGX Spark or compact appliance	Internal validation, executive demo, isolated workcell, small pilot	Good reference appliance and pilot profile; do not promise department-scale concurrency from a single box
RTX PRO / L40S style GPU server	First serious shared customer pilot or small department	Stronger shared inference profile when multiple engineers use the system
H100/H200 or customer enterprise AI platform	Larger enterprise rollout, heavier multimodal workloads, higher concurrency	Use when the customer already knows they need central shared capacity
Existing customer AI platform	Conservative IT, standardized private AI infrastructure	RapidDraft should run on approved infrastructure instead of forcing a bespoke box

The page-level message should be: RapidDraft can run locally or privately; final sizing depends on the customer deployment boundary, workload, model mix, and concurrency.

Theegarten-Pactec Pilot Interpretation¶

For Theegarten, the right public explanation is:

RapidDraft receives a selected release package: drawing, BOM, metadata, and optional EPLAN/CIM Database context.
RapidDraft runs deterministic checks first where structured evidence exists.
The Knowledge plane retrieves project documents and returns cited answers.
The inference plane supports drafting, summarization, visual interpretation, and language assistance.
The engineer reviews every finding and release summary before any PLM action.
CIM Database remains the system of record.

This avoids three traps:

sounding like a public AI chatbot,
implying automatic PLM release/write-back,
or promising that Theegarten data is already in a production customer-controlled system before the pilot boundary is agreed.

What The Public Theegarten Page Can Safely Say¶

Safe public claim	Internal detail	Avoid saying
RapidDraft supports a controlled local/private AI route	Current reference stack uses local inference, LiteLLM, Knowledge/RAG, Postgres + pgvector	"Theegarten data is already fully on-prem in production"
The Agent calls backend tools, not browser model endpoints	Browser-to-model calls are not the product path	endpoint names, raw ports, tokens, hostnames
Findings are evidence-linked	Sources may be drawing zones, BOM rows, DFM rules, or cited documents	"The AI sees everything"
Engineer approval stays mandatory	Human-in-the-loop is a product and liability boundary	"fully automatic release"
No silent training	Project indexing can be retrieval-only inside a boundary	"we train on your drawings"
DGX Spark is a pilot/reference appliance	Larger shared teams may need GPU servers or customer AI platforms	"DGX Spark is the default department production server"

Design Implications For Theegarten Page¶

The page should show one professional architecture visual with these layers:

flowchart LR
  A["Theegarten release package<br/>Drawing, BOM, metadata, EPLAN/CIM context"] -->
  B["RapidDraft review workspace<br/>deterministic checks + agent tools"] -->
  C["Controlled AI / Knowledge runtime<br/>private model route + cited retrieval"] -->
  D["Evidence-linked findings<br/>source, action, confidence, owner"] -->
  E["Engineer approval<br/>CIM Database remains authoritative"]

Visual language:

engineering workflow first, infrastructure second,
visible customer/private boundary,
clear human approval gate,
no secret endpoints or internal server names,
no vendor logo pile,
avoid a generic "AI cloud" diagram.

Roadmap Boundaries¶

Near-term pilot:

one selected release workflow,
one sample package,
a small rule/evidence checklist,
controlled local/private architecture review,
cited findings and review report,
human approval.

Later enterprise:

deeper PLM/PDM connector governance,
wider EPLAN/CIM Database sync,
Qdrant/object-store scale-out,
Temporal-backed durable workflows,
customer SSO/RBAC hardening,
signed offline updates,
enterprise GPU scheduling.

Do not sell later enterprise capabilities as if they are required for the first Theegarten follow-up page.

Sources¶

Local research report: /Users/adeelyj/code/local ai server setup/architecture research reports/Local AI architecture chathgpt.md
Local research report: /Users/adeelyj/code/local ai server setup/architecture research reports/Local AI architecture perplexity.md
Current reference implementation: /Users/adeelyj/code/local ai server setup/local-ai-stack-repo/docs/local-ai-wiki/overview-architecture.md
Current reference implementation: /Users/adeelyj/code/local ai server setup/local-ai-stack-repo/docs/local-ai-wiki/service-inventory-port-map.md
Current reference implementation: /Users/adeelyj/code/local ai server setup/local-ai-stack-repo/docs/local-ai-wiki/rag-ingestion-search-guide.md
Current validation notes: /Users/adeelyj/code/local ai server setup/local-ai-stack-repo/docs/validation/260603_eplan_rag_knowledge_upgrade/README.md
Theegarten pilot architecture page: ../../docs_theegarten/local-ai-agent-flow.md

Open Questions¶

Should Theegarten's preferred pilot boundary be on-prem hardware, dedicated private EU hosting, or customer-approved enterprise AI infrastructure?
Which sample Theegarten release package can be used to validate retrieval, drawing/BOM checks, and EPLAN context?
Should the public customer page mention DGX Spark visually, or keep the visual abstract to avoid anchoring on hardware too early?
Which enterprise architecture tier should RapidDraft prepare first: Compose pilot, single-node appliance, or Helm/Kubernetes?