CVAT, YOLO, Roboflow, Vast.ai, and W&B Training Operations¶
Operational source:
C:\Users\adeel\OneDrive\100_Knowledge\203_TextCAD\01_Product_Project_Management\00_Project_Management_n_skills\04_playbooks\DRAWING_MODEL_TRAINING_AND_CVAT_MODEL_SERVING.md
Why this page exists¶
This page captures the current operational workflow for drawing-model work so future prompts do not need to restate the full stack each time. It covers the shared CVAT serving surface, the YOLO training path, the Roboflow experiment path, Vast.ai usage, W&B run tracking, and the invariants that keep those systems from breaking each other.
Current system map¶
The current drawing-analysis workflow has five distinct layers:
- CVAT on the Fedora mini PC is the shared labeling and model-serving surface.
- Nuclio links trained models into CVAT so they appear in AI Tools.
- Vast.ai is the preferred remote GPU training surface for YOLO runs.
- W&B is the preferred dashboard and run-history surface for remote training.
- Roboflow remains a separate experiment path that must coexist with the local YOLO path without destabilizing the shared CVAT runtime.
Active repos and roles¶
The current repo split is:
D:\02_Code\49_yolotraining_firstdatasetfor the local YOLO segmentation and detection workflow, dataset assembly, Vast.ai launch scripts, W&B publishing, and CVAT deployment helpersD:\02_Code\50_CVAT_RoboFlowfor the Roboflow-focused branch/worktree and CVAT integration work that should stay operationally separate from the YOLO repoD:\02_Code\41_Training_drawing_roboflow_windowscodeas older Roboflow/AMD training history that still matters as reference, but is no longer the main operational source
Shared CVAT runtime and invariants¶
The shared CVAT surface is:
- host:
https://cvat.adeelyj.com - platform: self-hosted CVAT on a Fedora Linux mini PC
- model serving: Nuclio-backed functions exposed through CVAT AI Tools
The critical invariant is that both cvat_server and cvat_worker_annotation must run with:
DJANGO_SETTINGS_MODULE=settings
If either service falls back to the default production settings, detector requests will begin failing with trusted-origin or CSRF errors even if the main UI still loads.
The safe compose stack for changes that touch serverless/model linking is:
docker-compose.ymldocker-compose.override.ymlcomponents/serverless/docker-compose.serverless.yml
The practical rule is simple: model workflows may redeploy functions, but they should avoid recreating the shared CVAT stack with a different compose-file combination.
Current models already linked into CVAT¶
The currently known CVAT model names are:
YOLO26s Drawing Seg v1 (CPU)Roboflow Technical Drawing Public v1
These names should be treated as stable user-facing identities. New models should be added with clear, explicit names instead of silently replacing one of the existing entries.
Current YOLO path¶
The current YOLO work is split into two related tracks.
The segmentation path uses YOLO26s-seg and currently has a combined labeled set of 59 frames.
That path has already been trained, published to W&B, and linked back into CVAT as the
YOLO26s Drawing Seg v1 (CPU) model.
The detection path exists as a newer branch of work that reuses the same drawing-label vocabulary but builds a box-based dataset and training flow rather than a segmentation-only path. The intended user-facing CVAT name for that model is:
YOLO26s Drawing Detect v1 (CPU)
Current Roboflow path¶
Roboflow remains relevant as a parallel experiment path rather than a replacement for the local YOLO workflow.
The main operating rules are:
- keep the Roboflow work in its own branch or worktree
- do not assume Roboflow changes can recreate the shared CVAT stack safely by themselves
- validate the shared CVAT invariants after any Roboflow-side deployment change
- keep the public Roboflow-backed CVAT model clearly named so it is obvious when the output comes from Roboflow and not from a local YOLO model
Vast.ai operating role¶
Vast.ai is the preferred remote GPU training surface for YOLO work.
The practical reasons are:
- it avoids depending on the Fedora mini PC for training
- it avoids local AMD and Mac support complexity for the main run path
- it supports repeatable GPU-backed runs for both segmentation and detection experiments
The current YOLO repo already contains the remote launch and sync scripts used to run those jobs. The safe mental model is:
- prepare or normalize the dataset locally
- sync the dataset and scripts to Vast.ai
- train remotely
- publish or inspect the run in W&B
- copy the best artifacts back locally
- only then consider whether the resulting model should be linked into CVAT
W&B operating role¶
W&B is the preferred dashboard and run-history surface for the remote YOLO path.
Use it to answer:
- which run used which dataset
- which model variant was trained
- whether the smoke test and full baseline both succeeded
- which checkpoint is best
- whether a new run actually improved over the previous one
The local repo should still keep the critical artifacts and manifests, but W&B is the fastest way to understand what happened during a remote training run.
Practical health checks¶
Before treating the training-and-serving stack as healthy, confirm:
https://cvat.adeelyj.com/api/server/aboutreturns successfullyhttps://cvat.adeelyj.com/api/lambda/functionsreturns successfully- the relevant model appears in CVAT AI Tools
- the model can run on at least one frame without CSRF or
host.docker.internalerrors - the current training repo, dataset manifest, and W&B run identity all agree on which model was actually trained
Naming rule for future models¶
Every new model should make three things obvious:
- family: YOLO or Roboflow
- task: Seg or Detect
- runtime identity: version and CPU/GPU serving assumption
Examples:
YOLO26s Drawing Seg v1 (CPU)YOLO26s Drawing Detect v1 (CPU)Roboflow Technical Drawing Public v1
Current interpretation¶
The important shift is that drawing-model work is no longer just "some local experiments."
It is now a real operational capability with:
- a shared labeled-data surface in CVAT
- a repeatable GPU training path on Vast.ai
- a run-history layer in W&B
- a shared CVAT-serving path through Nuclio
- parallel YOLO and Roboflow model experiments that need to be coordinated rather than rediscovered
Combined-pass note¶
The current stack now also includes a multi-model combined pass for CVAT:
That page documents how the best-performing component models were combined into one merged detector pass and which model currently owns which target label.
Sources¶
C:\Users\adeel\OneDrive\100_Knowledge\203_TextCAD\01_Product_Project_Management\00_Project_Management_n_skills\04_playbooks\DRAWING_MODEL_TRAINING_AND_CVAT_MODEL_SERVING.mdD:\02_Code\49_yolotraining_firstdatasetD:\02_Code\50_CVAT_RoboFlow