CVAT, YOLO, Roboflow, Vast.ai, and W&B Training Operations¶

Operational source:

C:\Users\adeel\OneDrive\100_Knowledge\203_TextCAD\01_Product_Project_Management\00_Project_Management_n_skills\04_playbooks\DRAWING_MODEL_TRAINING_AND_CVAT_MODEL_SERVING.md

Why this page exists¶

This page captures the current operational workflow for drawing-model work so future prompts do not need to restate the full stack each time. It covers the shared CVAT serving surface, the YOLO training path, the Roboflow experiment path, Vast.ai usage, W&B run tracking, and the invariants that keep those systems from breaking each other.

Current system map¶

The current drawing-analysis workflow has five distinct layers:

CVAT on the Fedora mini PC is the shared labeling and model-serving surface.
Nuclio links trained models into CVAT so they appear in AI Tools.
Vast.ai is the preferred remote GPU training surface for YOLO runs.
W&B is the preferred dashboard and run-history surface for remote training.
Roboflow remains a separate experiment path that must coexist with the local YOLO path without destabilizing the shared CVAT runtime.

Active repos and roles¶

The current repo split is:

D:\02_Code\49_yolotraining_firstdataset for the local YOLO segmentation and detection workflow, dataset assembly, Vast.ai launch scripts, W&B publishing, and CVAT deployment helpers
D:\02_Code\50_CVAT_RoboFlow for the Roboflow-focused branch/worktree and CVAT integration work that should stay operationally separate from the YOLO repo
D:\02_Code\41_Training_drawing_roboflow_windowscode as older Roboflow/AMD training history that still matters as reference, but is no longer the main operational source

Shared CVAT runtime and invariants¶

The shared CVAT surface is:

host: https://cvat.adeelyj.com
platform: self-hosted CVAT on a Fedora Linux mini PC
model serving: Nuclio-backed functions exposed through CVAT AI Tools

The critical invariant is that both cvat_server and cvat_worker_annotation must run with:

DJANGO_SETTINGS_MODULE=settings

If either service falls back to the default production settings, detector requests will begin failing with trusted-origin or CSRF errors even if the main UI still loads.

The safe compose stack for changes that touch serverless/model linking is:

docker-compose.yml
docker-compose.override.yml
components/serverless/docker-compose.serverless.yml

The practical rule is simple: model workflows may redeploy functions, but they should avoid recreating the shared CVAT stack with a different compose-file combination.

Current models already linked into CVAT¶

The currently known CVAT model names are:

YOLO26s Drawing Seg v1 (CPU)
Roboflow Technical Drawing Public v1

These names should be treated as stable user-facing identities. New models should be added with clear, explicit names instead of silently replacing one of the existing entries.

Current YOLO path¶

The current YOLO work is split into two related tracks.

The segmentation path uses YOLO26s-seg and currently has a combined labeled set of 59 frames. That path has already been trained, published to W&B, and linked back into CVAT as the YOLO26s Drawing Seg v1 (CPU) model.

The detection path exists as a newer branch of work that reuses the same drawing-label vocabulary but builds a box-based dataset and training flow rather than a segmentation-only path. The intended user-facing CVAT name for that model is:

YOLO26s Drawing Detect v1 (CPU)

Current Roboflow path¶

Roboflow remains relevant as a parallel experiment path rather than a replacement for the local YOLO workflow.

The main operating rules are:

keep the Roboflow work in its own branch or worktree
do not assume Roboflow changes can recreate the shared CVAT stack safely by themselves
validate the shared CVAT invariants after any Roboflow-side deployment change
keep the public Roboflow-backed CVAT model clearly named so it is obvious when the output comes from Roboflow and not from a local YOLO model

Vast.ai operating role¶

Vast.ai is the preferred remote GPU training surface for YOLO work.

The practical reasons are:

it avoids depending on the Fedora mini PC for training
it avoids local AMD and Mac support complexity for the main run path
it supports repeatable GPU-backed runs for both segmentation and detection experiments

The current YOLO repo already contains the remote launch and sync scripts used to run those jobs. The safe mental model is:

prepare or normalize the dataset locally
sync the dataset and scripts to Vast.ai
train remotely
publish or inspect the run in W&B
copy the best artifacts back locally
only then consider whether the resulting model should be linked into CVAT

W&B operating role¶

W&B is the preferred dashboard and run-history surface for the remote YOLO path.

Use it to answer:

which run used which dataset
which model variant was trained
whether the smoke test and full baseline both succeeded
which checkpoint is best
whether a new run actually improved over the previous one

The local repo should still keep the critical artifacts and manifests, but W&B is the fastest way to understand what happened during a remote training run.

Practical health checks¶

Before treating the training-and-serving stack as healthy, confirm:

https://cvat.adeelyj.com/api/server/about returns successfully
https://cvat.adeelyj.com/api/lambda/functions returns successfully
the relevant model appears in CVAT AI Tools
the model can run on at least one frame without CSRF or host.docker.internal errors
the current training repo, dataset manifest, and W&B run identity all agree on which model was actually trained

Naming rule for future models¶

Every new model should make three things obvious:

family: YOLO or Roboflow
task: Seg or Detect
runtime identity: version and CPU/GPU serving assumption

Examples:

YOLO26s Drawing Seg v1 (CPU)
YOLO26s Drawing Detect v1 (CPU)
Roboflow Technical Drawing Public v1

Current interpretation¶

The important shift is that drawing-model work is no longer just "some local experiments."

It is now a real operational capability with:

a shared labeled-data surface in CVAT
a repeatable GPU training path on Vast.ai
a run-history layer in W&B
a shared CVAT-serving path through Nuclio
parallel YOLO and Roboflow model experiments that need to be coordinated rather than rediscovered

Combined-pass note¶

The current stack now also includes a multi-model combined pass for CVAT:

Drawing Ensemble v1

That page documents how the best-performing component models were combined into one merged detector pass and which model currently owns which target label.

Sources¶

C:\Users\adeel\OneDrive\100_Knowledge\203_TextCAD\01_Product_Project_Management\00_Project_Management_n_skills\04_playbooks\DRAWING_MODEL_TRAINING_AND_CVAT_MODEL_SERVING.md
D:\02_Code\49_yolotraining_firstdataset
D:\02_Code\50_CVAT_RoboFlow