Skip to content

C. Backend Architecture

Parent: Code Review Index Priority: pays for itself by the third feature; several items are cheap wins Surface affected: backend (server/)

The services themselves are solid; the orchestration around them is a monolith. These findings are about separating concerns so the team can move faster.


C1 — server/main.py is a 3038-line route monolith

  • Status: Open
  • Impact: H | Complexity: M | Time: 1–2 W
  • Files: server/main.py (entire file, ~75 @app.* decorators)

Finding. Every FastAPI route, every Pydantic body model, every orchestration function lives in one file. Navigation requires ctrl-F; diffs are noisy; onboarding cost is high. No APIRouter in use.

Action for Codex. 1. Create server/routers/ with one module per resource group: - models.py — upload, list, delete, scene - reviews.py — tickets, reviews, replies - dfm.py — plan, review-v2, templates, candidate sessions, PDF export - vision.py — view sets, reports - fusion.py - cnc.py - draftlint.py - part_facts.py - health.py 2. Move the matching Pydantic *Body models into the same files; extract truly shared ones into server/schemas/. 3. Keep server/main.py as the app factory only: create FastAPI(), add middleware, include routers, emit startup logs. Target < 200 lines. 4. Preserve URL paths exactly — this is a mechanical move, not a redesign. 5. Update imports and test wiring.

Acceptance criteria. - server/main.py < 200 lines. - pytest -q passes. - pytest -q server/tests/test_*_endpoints_wiring.py continues to cover each router. - grep -rnE "@app\.(get|post|put|delete|patch)" server/main.py returns nothing.

Depends on: A2 (auth dependency lands first so routers can be wired with it).


C2 — dfm_review_v2.py (6029 LOC), cnc_geometry_occ.py (4742), part_facts.py (3377) are internally heavy

  • Status: Open
  • Impact: M | Complexity: M | Time: gradual (per-PR as you touch the area)
  • Files: the three modules above

Finding. Each of these modules is a small library squashed into a single file. Logic is correct but hard to navigate. This is a pay-as-you-go problem, not a must-fix — but the next reviewer of the DFM rule engine will need to carve it up.

Action for Codex (opportunistic). When a PR touches one of these modules: 1. Extract the relevant class/function cluster into a sibling file under a new package (e.g. server/dfm_review_v2/ becomes a package with rule_eval.py, standards.py, payloads.py, executor.py). 2. Keep the public import surface stable (from server.dfm_review_v2 import ...). 3. Add module-level docstrings explaining the invariants.

Acceptance criteria. - No single file in server/ exceeds 1500 LOC after a ~6-month refactor window.

Depends on: C1 (routers first — shows which service imports matter).


C3 — No FastAPI Depends(); services are module-global singletons

  • Status: Open
  • Impact: M | Complexity: M | Time: 3 D
  • Files: server/main.py:147-214 (service instantiations), all route handlers

Finding. cad_service, review_store, model_store, part_facts_service, etc. are created at module import time and referenced directly inside handlers. This blocks unit testing (can't inject mocks), blocks per-request scoping, and blocks multi-tenant rollouts later.

Action for Codex. 1. In each routers/*.py, define factory functions: def get_review_store() -> ReviewStore: ... that return the singleton (cached with @lru_cache). 2. Handlers take the store via Depends(get_review_store) instead of referencing the global. 3. Tests override dependencies via app.dependency_overrides. 4. Move the actual instantiation into a lifespan handler so startup is explicit and testable.

Acceptance criteria. - Test: a route handler with a mocked ReviewStore returns the mocked data. - server/main.py no longer holds the service globals.

Depends on: C1 (do together).


C4 — Config is scattered os.getenv(...) with inline parsing

  • Status: Open
  • Impact: M | Complexity: L | Time: 1 D
  • Files: server/main.py:115-127, various other modules

Finding. DFM_COST_ENABLED, PART_FACTS_INLINE_PREWARM_MAX_TRIANGLES, etc. are parsed inline with ad-hoc int/bool coercion. access_auth.py does the same. Easy to typo, no central documentation of what exists.

Action for Codex. 1. Create server/settings.py using pydantic-settings.BaseSettings. One class, all fields typed, validated, and documented. 2. Instantiate settings = Settings() once at startup; inject via Depends(get_settings). 3. Replace every os.getenv(...) in server/ with the equivalent settings.* attribute. 4. Add a .env.example at the repo root listing every setting with a comment.

Acceptance criteria. - grep -rn "os.getenv" server/ returns only settings.py itself (or empty). - Startup fails fast with a clear error if a required setting is missing.

Depends on: none. Paired with A1.


C5 — Exception → HTTP status mapping is manual and scattered

  • Status: Open
  • Impact: M | Complexity: L | Time: 1 D
  • Files: handlers in server/main.py, exception classes across server/*.py

Finding. Each handler has its own try/except block mapping custom exceptions to HTTP codes. Default for an unhandled exception is HTTPException(status_code=500, detail=f"Unexpected ...: {exc}") — leaky (see A6) and repetitive.

Action for Codex. 1. Register global @app.exception_handler for each custom exception family: - ModelNotFoundError, DfmTemplateNotFoundError, ... → 404 - ValidationError-like, *ValidationError, DfmBundleValidationError → 400 - AccessIdentityError → 403 - *RateLimitError → 429 - Exception → 500 with opaque body (logs full trace) 2. Delete the per-handler try/except blocks that now become redundant. 3. Pair with A6 (sanitized bodies, request_id on every response).

Acceptance criteria. - Per-handler except *Error blocks count drops from ~dozens to < 5. - Test: each exception class hits the correct HTTP status.

Depends on: A6 (same PR).


C6 — Typing style is mixed (List/Dict vs list/dict)

  • Status: Open
  • Impact: L | Complexity: L | Time: 0.5 D
  • Files: server/model_store.py:6, server/review_store.py:6, server/cad_service.py:12, others

Finding. Older modules import from typing import Dict, List, Optional; newer ones use built-in generics. Cosmetic, but trivial to standardize.

Action for Codex. 1. Run pyupgrade --py311-plus $(git ls-files 'server/*.py'). 2. Run ruff check --select UP --fix server/. 3. Add pyupgrade (or ruff) to CI (see E3).

Acceptance criteria. - ruff check --select UP server/ clean.

Depends on: E3 (linting in CI).


C7 — Persistence plan: JSON today, but v0 spec says SQLite/Postgres

  • Status: Open
  • Impact: H | Complexity: H | Time: 2–4 W
  • Files: all *_store.py modules; new server/db/ package

Finding. MVP v0 spec says "SQLite for pilot; Postgres for multi-user scaling". Today's reality is raw JSON files — worse than SQLite on both ergonomics (no query) and reliability (see B1). Before seat-#2 of any team pilot, this needs to move.

Action for Codex (phased). 1. Phase 1 — introduce SQLModel or SQLAlchemy 2.0 with SQLite locally, Postgres in prod. 2. Define entity models matching the MVP v0 data model: Artifact, Snapshot, EvidencePointer, ReviewSession, Finding, Issue, Comment, ActivityLog. 3. Keep the existing store classes as the API surface; swap the implementation from JSON to DB. 4. Add an Alembic migration setup; one-way migration script from existing JSON folders. 5. Phase 2 — new pilot deployments use Postgres; legacy JSON files read-only until migrated.

Acceptance criteria. - All store unit tests pass against both SQLite and Postgres (parametrized fixtures). - A migration script is idempotent on the existing models/ folder. - Rollback plan documented in docs/contracts/persistence.md.

Depends on: B1 (locking remains relevant during transition), F1 (activity log schema).