Vision Model Integration in RapidDraft¶
Source files:
Architechture & Research/RapidDraft/Technical Architecture/Vision Model Integration Notes.mdLast synthesized: March 2026
Overview¶
RapidDraft uses vision models strategically in its review and validation pipeline. Vision models are particularly effective for reading and validating drawing compliance, but they are not relied upon for true 3D manufacturability assessment. This document describes what vision models excel at, their limitations, and how they integrate with RapidDraft's deterministic CAD-based checks.
What Vision Models Do Well¶
1. Drawing and Specification Compliance (Pack A Checks)¶
Vision models and OCR reliably extract structured information from drawing images:
Capability: Extract callouts and specifications - Read dimension values, tolerance symbols, and surface finish annotations - Identify material specifications, heat treatment callouts, and plating requirements - Parse thread specifications (pitch, profile, fit class) - Recognize standard symbols (welding, GD&T callout syntax) - Extract title block information (drawing number, revision, date, approvals)
Inputs: - Drawing PDF or screenshot (2D raster image) - Trained OCR model for technical drawing text - Vision model trained on engineering drawing layouts
Outputs: - Extracted dimension values and callout text - Normalized specifications (e.g., "M10 x 1.5 – 6H" → diameter=10, pitch=1.5, tolerance_class=6H) - Structured drawing metadata
Example rule evaluation:
Rule: "All dimensions must be labeled with tolerance class"
Vision input: Extract all dimension text from drawing view
Check: For each dimension, confirm presence of tolerance indicator (e.g., ±0.1, 6H, +0.2/-0.1)
Output: List of missing tolerance labels (if any)
2. DFM Intent Mismatch Detection (Pack C/D Bridge)¶
Vision models can flag high-level inconsistencies between drawing intent and CAD geometry:
Capability: Detect design–specification mismatch - Drawing specifies "Ra 0.8 everywhere" but model shows a large milled plate (cost/process red flag) - Tolerance specification implies high precision, but feature geometry is simple (over-specified) - Surface finish requirement conflicts with indicated process (e.g., "Ø10 Ra 0.4" but process is casting)
Inputs: - Extracted CAD geometry (feature types, material, process hint) - Drawing callouts and specifications (from OCR or direct specification input) - Manufacturing process context (material, typical capability)
Outputs: - Mismatch flags with confidence score - Recommendation to review specification with manufacturing
Example:
CAD feature: large flat milled pocket on aluminum
Drawing specification: Ra 0.8 finish everywhere
AI assessment: "High-cost finish specification for typical milling.
Typical mill can achieve Ra 1.6–3.2. Confirm intent:
Is Ra 0.8 functionally required, or can it be relaxed?"
Confidence: 75% (pattern-based, manufacturing heuristic)
What Vision Models Cannot Reliably Do¶
Vision models are not suitable for true manufacturability checks that require 3D geometry understanding:
1. No True 3D Geometric Analysis¶
Vision models see only 2D projections and cannot assess: - Thin wall detection: Requires thickness measurement in 3D space (not visible in 2D drawing) - Tool access and fixture fit: Requires understanding of 3D cavity depth, wall angles, and approach vectors - Hole aspect ratio validation: Visible in orthographic projection but requires 3D model to assess drilling difficulty - Undercut detection: May be hidden in certain views; only clear from full 3D model - Feature adjacency and stacking: Multiple features' interaction is context-dependent
2. Assembly and Context Ambiguity¶
Vision models cannot reliably infer: - Mating surface definitions – which faces actually mate, and which are secondary - Clearance and interference – requires 3D assembly geometry and tolerance stack-up - Tolerance type selection – GD&T datum choice depends on assembly role (vision cannot see assembly context) - Critical vs. non-critical dimensions – importance depends on function, not visible in drawing
3. Parametric and Historical Intent¶
Vision models cannot capture: - Why dimensions are what they are – parametric relationships and design rules (stored in CAD expressions) - Prior design iterations and decisions – decision history is not visible in a single drawing image - Tolerance stack-up logic – requires traceability through dependent features
Integration Pattern: Vision + Deterministic Checks¶
RapidDraft combines vision with deterministic CAD analysis in a two-layer approach:
Layer 1: Deterministic CAD-Based Checks (Packs B–E)¶
Run directly on the STEP file and NX model: - Extract geometry (features, faces, edges) - Build knowledge graph of part structure - Run manufacturing feasibility checks (thin walls, aspect ratios, draft angles) - Verify hole geometry against CAD B-Rep - Analyze tolerance stack-up using parametric expressions
Why this works: - No ambiguity – 3D geometry is precise - Deterministic – same check on same model always produces same result - Full context – all geometric relationships are available
Layer 2: Vision-Based Validation and Compliance (Packs A + C Bridge)¶
Run on drawing PDF or screenshot alongside deterministic checks: - OCR extracts dimension values and callouts from drawing image - Vision model identifies drawing layout and view arrangement - Compare extracted dimensions to CAD feature parameters - Flag specification–geometry mismatches - Validate drawing compliance with stated standard
Why this works: - Vision is good at reading text and recognizing patterns - Extracted specifications are the "intent as stated in drawing" - Mismatch detection highlights where drawing and CAD diverge (possible error)
Confidence Handling¶
Results are tagged with confidence levels:
| Confidence | Source | How to Handle |
|---|---|---|
| High (>95%) | Deterministic checks on CAD geometry (thin walls, undercuts, hole aspect ratio) | Present as findings; no user override needed |
| Medium (75–95%) | Pattern-based manufacturing heuristics; vision-extracted specifications | Present as suggestions; engineer should verify |
| Low (<75%) | Context-dependent checks; assembly relationships; AI-inferred intent | Present as questions, not assertions; engineer must validate |
Example:
Check: "Surface finish achievable with stated process"
CAD input: Feature type = milling, material = aluminum
Drawing input: Ra = 0.8 µm (extracted by OCR)
Heuristic: Typical aluminum milling achieves Ra 1.6–3.2 µm
Result: "Surface finish specification is tighter than typical process capability"
Confidence: 65% (depends on actual machine and operator skill)
Action: Flag as suggestion; engineer reviews with manufacturing
Output Combination¶
When both vision and deterministic checks are available, results are merged:
Drawing Validation Report
─────────────────────────
PACK A (Compliance) — Vision Input
✓ Title block complete
✓ Drawing standard stated (ISO 13715)
⚠ Revision tracking present but missing last update date
(Confidence: 85%, OCR parsed revision field)
PACK B (Dimensional Completeness) — CAD Input
✓ All features dimensioned
✓ Material specified
⚠ Surface finish incomplete: East face missing Ra specification
(Feature: Milling surface, face ID: 47)
PACK C/D (DFM) — Vision + CAD Combined
⚠ Specification mismatch on primary face:
- Drawing specifies: Ra 0.8
- Feature geometry: Large milled pocket (100 × 80 × 20 mm)
- Typical process capability: Ra 1.6–3.2 µm
- Recommendation: Confirm with manufacturing if Ra 0.8 is functional requirement
(Confidence: 70%, manufacturing heuristic)
PACK E (Assembly) — CAD Input
✓ No assembly references in part-level drawing (N/A)
Vision Model Scope in RapidDraft¶
Included in MVP: - OCR extraction of dimension values, tolerance callouts, and title block metadata - Comparison of extracted specifications to CAD geometry (mismatch detection) - Simple pattern recognition for common specification formats (ISO GD&T, ASME Y14.5) - Confidence scoring for extracted data (high confidence for printed text, lower for handwritten notes)
Not included in MVP (deferred): - Full assembly reasoning from multi-view drawing images - Automatic hole depth or cavity depth inference from orthographic views - Feature segmentation and classification from drawing images - Tolerance stack-up calculation from drawing dimensions alone
Planned for future: - Integration with CAD feature extraction to link drawn views to 3D features - Learning from user corrections to improve OCR and pattern recognition - Multi-view 3D reconstruction for parts not available as CAD models (e.g., supplier drawings)
User Trust and Explainability¶
Because vision models can fail or hallucinate, RapidDraft emphasizes explainability:
- Always show the image evidence – display the drawing region where data was extracted
- Tag confidence scores – make uncertainty visible to the engineer
- Provide alternatives – if OCR is uncertain, show candidate interpretations
- Allow manual override – engineer can correct extracted data (e.g., "OCR read '6' as '8'; I'm correcting to 6")
- Capture corrections as training data – corrections feed back to improve future extractions
Example UI feedback:
Vision extraction: Ø12 H7 (confidence: 92%)
[Image highlight showing extracted text "Ø12 H7" circled in red]
Alternative interpretations (sorted by confidence):
1. Ø12 H7 (92%)
2. Ø12 H8 (6%)
3. Ø10 H7 (2%)
[Engineer clicks radio button to confirm or select alternative]
[Confirmation is logged for model retraining]
Architecture: Where Vision Models Run¶
Vision model inference operates in the drawing validation stage:
CAD Model (.step, .prt)
↓
[NXOpen Geometry Extraction]
↓
Deterministic Checks (Packs B–E)
├─ Feature topology
├─ Thin walls, undercuts, aspect ratios
├─ Tolerance logic
├─ Manufacturing feasibility
└─ Assembly relationships
Drawing Image (PDF, screenshot)
↓
[Vision Model + OCR]
↓
Specification Extraction
├─ Dimension values
├─ Tolerance callouts
├─ Material, finish, process specs
└─ Drawing metadata
↓
[Merge + Confidence Scoring]
↓
Combined Report (Packs A + C/D Bridge)
├─ Drawing compliance (high confidence)
├─ Specification–geometry matches (medium confidence)
└─ Manufacturing heuristics (medium-to-low confidence)
Vision models can run: - Locally (lightweight model on client device) – good for privacy, instant feedback - Cloud-based (higher accuracy models) – better accuracy, but data sensitivity - Hybrid (draft on-device, fallback to cloud for low-confidence extractions)
RapidDraft defaults to local inference for sensitive industrial drawings, with optional cloud offload only with explicit user consent.
Summary: Vision's Role in RapidDraft¶
Vision models are not the primary engine for manufacturing feasibility checking. Instead, they fill a specific niche:
- Good for: Reading drawing intent (dimensions, callouts, specifications) and flagging specification–geometry mismatches
- Not good for: True 3D manufacturability assessment (use CAD geometry for that)
- Value add: Fast feedback on drawing compliance and intent validation without requiring 3D model
- Trust model: Always show evidence, tag confidence, allow override, learn from corrections
This approach respects the reality that engineers must remain accountable for manufacturing decisions, while letting AI accelerate routine compliance checks and intent validation.