F. Product / Trust Gaps (wiki promises vs code)¶

Parent: Code Review Index Priority: these are the claims the wiki and USP document already make; code has to catch up Surface affected: backend + frontend + data model

The Master Narrative, MVP v0 spec, and USP for CAD Experts all promise specific properties: deterministic rule-driven findings, evidence pointers, immutable activity log, snapshot versioning, carry-forward across revisions, categorized vision vs hard findings. These findings close the gap between what the wiki claims and what the code delivers.

F1 — "Immutable activity log" does not exist¶

Status: Open
Impact: H | Complexity: M | Time: 1 W
Files: new server/activity_log.py, touch server/review_store.py, server/dfm_review_jobs.py, server/main.py
Wiki source: MVP v0 - Review Companion → "Immutable activity log for audit trail"

Finding. The wiki promises ActivityLog — immutable event stream for audit trail. Today, review/ticket state is a rewritable JSON blob. No events, no append-only guarantee, no history beyond "last-write-wins".

Action for Codex. 1. Create server/activity_log.py exposing: - append(event: ActivityEvent) — append-only JSONL, fsync'd. - read(filter=...) -> Iterator[ActivityEvent] - Event shape: {event_id, occurred_at, actor_email, artifact_id, kind, payload}. 2. Hook into every state-changing operation: ticket CRUD, review create/update, DFM review run/complete, issue carry-forward, finding disposition. 3. Expose GET /api/models/{id}/activity (authed) returning paginated events. 4. UI reads it as a right-rail timeline on the Design Review workspace. 5. When C7 lands (SQLite/Postgres), migrate the activity log into a proper append-only table.

Acceptance criteria. - Creating a ticket emits exactly one event; updating it emits a second with the diff. - Event file is append-only (no rewrites) — enforce with a test that attempts a rewrite and expects failure. - /api/models/{id}/activity returns events in creation order.

Depends on: B1 (atomic append), A2 (auth to capture actor_email).

F2 — "Evidence pointers" are camera-relative, not entity-stable¶

Status: Open
Impact: H | Complexity: H | Time: 2–3 W
Files: server/main.py:253-266 (PinPositionBody), server/review_store.py, server/canonical_scene_service.py, new docs/contracts/evidence_pointer.md
Wiki source: MVP v0 spec — "EvidencePointer — hierarchical reference to exact entity", Problems doc — Problem 2 (stable comment-to-feature linking across revisions)

Finding. Today's pin is {position, normal, cameraState} — camera-relative. This will not carry forward across revisions. The MVP v0 spec lists this as Problem 2 with a layered-linking mitigation; the code has not adopted the mitigation.

Action for Codex. 1. Define EvidencePointer in a new server/schemas/evidence_pointer.py:

{
  "target_kind": "component_node" | "face" | "edge" | "vertex" | "dim" | "note",
  "component_node_name": str,
  "face_id": int | None,
  "edge_id": int | None,
  "stable_hash": str,            # hash of (component_node_name, face_id|edge_id, geometry_fingerprint)
  "fallback": {                  # only used when stable_hash resolution fails
    "position": [x,y,z],
    "normal": [x,y,z],
    "camera_state": {...}
  },
  "snapshot_hash": str,
  "schema_version": "evidence-pointer-v1"
}

2. When creating a pin, always attempt to resolve the picked point to the nearest face/edge and store both the stable reference and the fallback pose. 3. Add a carry-forward resolver: given a pointer + a new snapshot, try (a) stable hash lookup, (b) face/edge ID, (c) geometric proximity. Each with a confidence score. 4. Surface "Needs rebind" in the UI when confidence falls below a threshold; provide a one-click rebind (meets Problem 2 mitigation). 5. Document the contract in docs/contracts/evidence_pointer.md.

Acceptance criteria. - Creating a pin on face F in Rev A, running the resolver against a Rev B where only the view name changed, yields the same face with confidence=1. - The ≥80% carry-forward success criterion in the v0 spec becomes measurable.

Depends on: F3 (snapshot versioning), C7 (schema change is cheaper after persistence move).

F3 — Snapshots do not carry a checked `schemaVersion`¶

Status: Open
Impact: H | Complexity: M | Time: 3–5 D
Files: server/canonical_scene_service.py, server/part_facts.py, server/schemas/ (new)
Wiki source: MVP v0 spec — "Snapshot versioning — versioned to allow tool evolution without breaking old reviews"

Finding. The backend produces canonical scenes and Part Facts payloads, but schemaVersion is not consistently attached nor checked on read. If the payload shape evolves, old reviews break silently.

Action for Codex. 1. Add a schema_version field (string, semver) to every persisted artifact: canonical scene, part facts, DFM review, pin position, issue, finding. 2. On read, validate with a pydantic model versioned by schema_version. Unknown versions → return a clear "requires re-extraction" state, not a 500. 3. Keep a server/schemas/ registry documenting current + supported prior versions. 4. Emit schema_version in every response payload.

Acceptance criteria. - Test: reading a fixture with an older schema_version returns a recognized "legacy" flag, not an exception. - Every persisted JSON contains schema_version.

Depends on: none. Prerequisite for F2.

F4 — Findings do not expose rule_id/severity/evidence consistently¶

Status: Open
Impact: H | Complexity: M | Time: 1 W
Files: server/dfm_review_v2.py (finding construction, search for _FINDING/finding_record), response serializer, PDF builder (server/dfm_pdf_report.py)
Wiki source: MVP v0 spec — "Finding — ruleId, severity, evidencePointer, message, parameters"; USP doc — "every annotation traces to an exact rule, feature, and context"

Finding. The rule engine produces rule_id, source_standard_clause, threshold internally (see server/dfm_review_v2.py:2205-2215), but the response payload and the PDF builder do not surface them consistently. This is the single most important trust artifact the tool can emit.

Action for Codex. 1. Audit every path that produces a finding. Ensure every finding includes: rule_id, rule_version, source_standard_clause, severity, title, description, recommended_action, threshold_expression, observed, target, evidence_pointer[], screenshot_keys[]. 2. Define a strict pydantic model Finding and validate every produced finding against it. 3. Fail loudly on any producer that emits a finding missing rule_id (in dev; warn in prod with a log line). 4. Update the UI finding cards and the PDF report (see G-findings) to render the new fields.

Acceptance criteria. - A schema test asserts every finding in every fixture validates. - No code path can produce a finding without rule_id.

Depends on: F3 (schema versioning).

F5 — No continuous measurement of the v0 success criteria¶

Status: Open
Impact: H | Complexity: M | Time: 2 W
Files: benchmark_data/, new scripts/quality/, new docs/validation/dfm_regression.md
Wiki source: MVP v0 spec — "≥95% actionable findings; <5% false positives"; Problems doc — "test methods" per problem

Finding. The v0 spec is explicit about finding-quality targets. There is no harness in the repo that tracks them release-over-release. benchmark_data/ exists but the wiring is not obvious.

Action for Codex. 1. Curate a golden corpus of 20-30 parts with labeled expected findings (use benchmark_data/ as starting point). 2. Add scripts/quality/run_dfm_regression.py that runs the full DFM review against every golden part and emits a per-rule precision/recall table. 3. Commit the report as a versioned artifact in docs/validation/dfm_regression_report_<date>.md. 4. Fail CI if regression on any rule exceeds a threshold (e.g. precision drops > 5 pp). 5. Expose the summary on the wiki under 01_RapidDraft/07_Operations/.

Acceptance criteria. - A release PR shows the regression diff for every DFM rule. - Report is reproducible from scripts/quality/run_dfm_regression.py.

Depends on: F4 (every finding has rule_id), E1 (CI slots).

F6 — Teamcenter path is aspirational; fake-TC fixture is missing¶

Status: Open
Impact: M | Complexity: M | Time: 1 W
Files: new server/teamcenter/ package, new fixture under server/fixtures/teamcenter/
Wiki source: MVP v0 spec — "Automation Track: NX plugin that exports…"; Problems doc Problem 6 — "fake Teamcenter folder mode for demos"

Finding. Wiki says the MVP is "read-only + export bundle" with a "fake Teamcenter" folder mode for demos. Neither a real connector nor a fake-TC shim appears in the repo. For a pilot conversation this undermines the product story.

Action for Codex. 1. Either scope Teamcenter explicitly out of v0 publicly (edit the wiki MVP v0 doc), or: 2. Ship a minimal fake-TC: a folder layout that mimics Item/ItemRev/Datasets, an fs_tc.py connector module that lists and reads from it, and an end-to-end test that runs a review starting from Item/Rev. 3. Expose "Load from Teamcenter (fake-TC)" as a separate button in the launcher with a clear label.

Acceptance criteria. - Test: creating a fake-TC folder, running the flow, produces a review identical to the local-upload flow for the same file.

Depends on: none.

F7 — Vision vs deterministic findings are not clearly separated in UI/PDF¶

Status: Open
Impact: M | Complexity: L | Time: 2 D
Files: web/src/components/DfmSidebar.tsx, web/src/components/VisionAnalysisSidebar.tsx, server/dfm_pdf_report.py
Wiki source: Problems doc (v1) Problem 8 — "Mixing Deterministic DFM and Vision DFM Without Confusing the User"

Finding. Problem 8 mitigation is: label outputs clearly — "Hard Findings" for measured/rule-based, "Review Prompts / Potential Risks" for vision. Today the UI and PDF do not enforce this separation.

Action for Codex. 1. Add a kind: "deterministic" | "advisory" discriminator on every finding. 2. In the UI, render them in two separate sections with distinct headers and iconography. 3. In the PDF, use two separate sections and distinct severity color palettes (solid for deterministic, hatched for advisory). 4. Disallow mixing in summary counts; show two separate count rows.

Acceptance criteria. - No visual path where a user sees a deterministic and an advisory finding in the same grouping. - PDF appendix labels each group's source.

Depends on: F4 (Finding schema change in same PR).

F8 — Cost surface (v2) rollout needs a "implemented / validated / runtime-verified" status¶

Status: Open
Impact: L | Complexity: L | Time: 0.5 D (doc); larger scope is product-managed
Files: server/dfm_costing.py, docs/contracts/dfm_cost.md (new), wiki 03_Product_Specs/MVP_v2_Cost_Estimation.md
Wiki source: System Architecture — "three states: implemented / validated locally / runtime-verified on hosted path"

Finding. DFM_COST_ENABLED is a boolean env-var flag. The wiki wants an explicit rollout-state per feature. Cost estimation is a high-claim feature and must not ship "silently enabled".

Action for Codex. 1. Replace the boolean with a tri-state: off | implemented | validated | runtime_verified. 2. In the UI, show the state to the user for any cost output ("Cost estimation: validated locally — not yet runtime-verified"). 3. Update the MVP v2 wiki page to match.

Acceptance criteria. - Env var value reflected in /api/config response; UI renders banner. - Wiki matches.