C. Backend Architecture¶
Parent: Code Review Index Priority: pays for itself by the third feature; several items are cheap wins Surface affected: backend (
server/)
The services themselves are solid; the orchestration around them is a monolith. These findings are about separating concerns so the team can move faster.
C1 — server/main.py is a 3038-line route monolith¶
- Status: Open
- Impact: H | Complexity: M | Time: 1–2 W
- Files:
server/main.py(entire file, ~75@app.*decorators)
Finding. Every FastAPI route, every Pydantic body model, every orchestration function lives in one file. Navigation requires ctrl-F; diffs are noisy; onboarding cost is high. No APIRouter in use.
Action for Codex.
1. Create server/routers/ with one module per resource group:
- models.py — upload, list, delete, scene
- reviews.py — tickets, reviews, replies
- dfm.py — plan, review-v2, templates, candidate sessions, PDF export
- vision.py — view sets, reports
- fusion.py
- cnc.py
- draftlint.py
- part_facts.py
- health.py
2. Move the matching Pydantic *Body models into the same files; extract truly shared ones into server/schemas/.
3. Keep server/main.py as the app factory only: create FastAPI(), add middleware, include routers, emit startup logs. Target < 200 lines.
4. Preserve URL paths exactly — this is a mechanical move, not a redesign.
5. Update imports and test wiring.
Acceptance criteria.
- server/main.py < 200 lines.
- pytest -q passes.
- pytest -q server/tests/test_*_endpoints_wiring.py continues to cover each router.
- grep -rnE "@app\.(get|post|put|delete|patch)" server/main.py returns nothing.
Depends on: A2 (auth dependency lands first so routers can be wired with it).
C2 — dfm_review_v2.py (6029 LOC), cnc_geometry_occ.py (4742), part_facts.py (3377) are internally heavy¶
- Status: Open
- Impact: M | Complexity: M | Time: gradual (per-PR as you touch the area)
- Files: the three modules above
Finding. Each of these modules is a small library squashed into a single file. Logic is correct but hard to navigate. This is a pay-as-you-go problem, not a must-fix — but the next reviewer of the DFM rule engine will need to carve it up.
Action for Codex (opportunistic).
When a PR touches one of these modules:
1. Extract the relevant class/function cluster into a sibling file under a new package (e.g. server/dfm_review_v2/ becomes a package with rule_eval.py, standards.py, payloads.py, executor.py).
2. Keep the public import surface stable (from server.dfm_review_v2 import ...).
3. Add module-level docstrings explaining the invariants.
Acceptance criteria.
- No single file in server/ exceeds 1500 LOC after a ~6-month refactor window.
Depends on: C1 (routers first — shows which service imports matter).
C3 — No FastAPI Depends(); services are module-global singletons¶
- Status: Open
- Impact: M | Complexity: M | Time: 3 D
- Files:
server/main.py:147-214(service instantiations), all route handlers
Finding. cad_service, review_store, model_store, part_facts_service, etc. are created at module import time and referenced directly inside handlers. This blocks unit testing (can't inject mocks), blocks per-request scoping, and blocks multi-tenant rollouts later.
Action for Codex.
1. In each routers/*.py, define factory functions: def get_review_store() -> ReviewStore: ... that return the singleton (cached with @lru_cache).
2. Handlers take the store via Depends(get_review_store) instead of referencing the global.
3. Tests override dependencies via app.dependency_overrides.
4. Move the actual instantiation into a lifespan handler so startup is explicit and testable.
Acceptance criteria.
- Test: a route handler with a mocked ReviewStore returns the mocked data.
- server/main.py no longer holds the service globals.
Depends on: C1 (do together).
C4 — Config is scattered os.getenv(...) with inline parsing¶
- Status: Open
- Impact: M | Complexity: L | Time: 1 D
- Files:
server/main.py:115-127, various other modules
Finding. DFM_COST_ENABLED, PART_FACTS_INLINE_PREWARM_MAX_TRIANGLES, etc. are parsed inline with ad-hoc int/bool coercion. access_auth.py does the same. Easy to typo, no central documentation of what exists.
Action for Codex.
1. Create server/settings.py using pydantic-settings.BaseSettings. One class, all fields typed, validated, and documented.
2. Instantiate settings = Settings() once at startup; inject via Depends(get_settings).
3. Replace every os.getenv(...) in server/ with the equivalent settings.* attribute.
4. Add a .env.example at the repo root listing every setting with a comment.
Acceptance criteria.
- grep -rn "os.getenv" server/ returns only settings.py itself (or empty).
- Startup fails fast with a clear error if a required setting is missing.
Depends on: none. Paired with A1.
C5 — Exception → HTTP status mapping is manual and scattered¶
- Status: Open
- Impact: M | Complexity: L | Time: 1 D
- Files: handlers in
server/main.py, exception classes acrossserver/*.py
Finding. Each handler has its own try/except block mapping custom exceptions to HTTP codes. Default for an unhandled exception is HTTPException(status_code=500, detail=f"Unexpected ...: {exc}") — leaky (see A6) and repetitive.
Action for Codex.
1. Register global @app.exception_handler for each custom exception family:
- ModelNotFoundError, DfmTemplateNotFoundError, ... → 404
- ValidationError-like, *ValidationError, DfmBundleValidationError → 400
- AccessIdentityError → 403
- *RateLimitError → 429
- Exception → 500 with opaque body (logs full trace)
2. Delete the per-handler try/except blocks that now become redundant.
3. Pair with A6 (sanitized bodies, request_id on every response).
Acceptance criteria.
- Per-handler except *Error blocks count drops from ~dozens to < 5.
- Test: each exception class hits the correct HTTP status.
Depends on: A6 (same PR).
C6 — Typing style is mixed (List/Dict vs list/dict)¶
- Status: Open
- Impact: L | Complexity: L | Time: 0.5 D
- Files:
server/model_store.py:6,server/review_store.py:6,server/cad_service.py:12, others
Finding. Older modules import from typing import Dict, List, Optional; newer ones use built-in generics. Cosmetic, but trivial to standardize.
Action for Codex.
1. Run pyupgrade --py311-plus $(git ls-files 'server/*.py').
2. Run ruff check --select UP --fix server/.
3. Add pyupgrade (or ruff) to CI (see E3).
Acceptance criteria.
- ruff check --select UP server/ clean.
Depends on: E3 (linting in CI).
C7 — Persistence plan: JSON today, but v0 spec says SQLite/Postgres¶
- Status: Open
- Impact: H | Complexity: H | Time: 2–4 W
- Files: all
*_store.pymodules; newserver/db/package
Finding. MVP v0 spec says "SQLite for pilot; Postgres for multi-user scaling". Today's reality is raw JSON files — worse than SQLite on both ergonomics (no query) and reliability (see B1). Before seat-#2 of any team pilot, this needs to move.
Action for Codex (phased).
1. Phase 1 — introduce SQLModel or SQLAlchemy 2.0 with SQLite locally, Postgres in prod.
2. Define entity models matching the MVP v0 data model: Artifact, Snapshot, EvidencePointer, ReviewSession, Finding, Issue, Comment, ActivityLog.
3. Keep the existing store classes as the API surface; swap the implementation from JSON to DB.
4. Add an Alembic migration setup; one-way migration script from existing JSON folders.
5. Phase 2 — new pilot deployments use Postgres; legacy JSON files read-only until migrated.
Acceptance criteria.
- All store unit tests pass against both SQLite and Postgres (parametrized fixtures).
- A migration script is idempotent on the existing models/ folder.
- Rollback plan documented in docs/contracts/persistence.md.
Depends on: B1 (locking remains relevant during transition), F1 (activity log schema).