Skip to content

CVAT, YOLO, Roboflow, Vast.ai, and W&B Training Operations

Operational source:

C:\Users\adeel\OneDrive\100_Knowledge\203_TextCAD\01_Product_Project_Management\00_Project_Management_n_skills\04_playbooks\DRAWING_MODEL_TRAINING_AND_CVAT_MODEL_SERVING.md

Why this page exists

This page captures the current operational workflow for drawing-model work so future prompts do not need to restate the full stack each time. It covers the shared CVAT serving surface, the YOLO training path, the Roboflow experiment path, Vast.ai usage, W&B run tracking, and the invariants that keep those systems from breaking each other.

Current system map

The current drawing-analysis workflow has five distinct layers:

  1. CVAT on the Fedora mini PC is the shared labeling and model-serving surface.
  2. Nuclio links trained models into CVAT so they appear in AI Tools.
  3. Vast.ai is the preferred remote GPU training surface for YOLO runs.
  4. W&B is the preferred dashboard and run-history surface for remote training.
  5. Roboflow remains a separate experiment path that must coexist with the local YOLO path without destabilizing the shared CVAT runtime.

Active repos and roles

The current repo split is:

  • D:\02_Code\49_yolotraining_firstdataset for the local YOLO segmentation and detection workflow, dataset assembly, Vast.ai launch scripts, W&B publishing, and CVAT deployment helpers
  • D:\02_Code\50_CVAT_RoboFlow for the Roboflow-focused branch/worktree and CVAT integration work that should stay operationally separate from the YOLO repo
  • D:\02_Code\41_Training_drawing_roboflow_windowscode as older Roboflow/AMD training history that still matters as reference, but is no longer the main operational source

Shared CVAT runtime and invariants

The shared CVAT surface is:

  • host: https://cvat.adeelyj.com
  • platform: self-hosted CVAT on a Fedora Linux mini PC
  • model serving: Nuclio-backed functions exposed through CVAT AI Tools

The critical invariant is that both cvat_server and cvat_worker_annotation must run with:

DJANGO_SETTINGS_MODULE=settings

If either service falls back to the default production settings, detector requests will begin failing with trusted-origin or CSRF errors even if the main UI still loads.

The safe compose stack for changes that touch serverless/model linking is:

  1. docker-compose.yml
  2. docker-compose.override.yml
  3. components/serverless/docker-compose.serverless.yml

The practical rule is simple: model workflows may redeploy functions, but they should avoid recreating the shared CVAT stack with a different compose-file combination.

Current models already linked into CVAT

The currently known CVAT model names are:

  1. YOLO26s Drawing Seg v1 (CPU)
  2. Roboflow Technical Drawing Public v1

These names should be treated as stable user-facing identities. New models should be added with clear, explicit names instead of silently replacing one of the existing entries.

Current YOLO path

The current YOLO work is split into two related tracks.

The segmentation path uses YOLO26s-seg and currently has a combined labeled set of 59 frames. That path has already been trained, published to W&B, and linked back into CVAT as the YOLO26s Drawing Seg v1 (CPU) model.

The detection path exists as a newer branch of work that reuses the same drawing-label vocabulary but builds a box-based dataset and training flow rather than a segmentation-only path. The intended user-facing CVAT name for that model is:

YOLO26s Drawing Detect v1 (CPU)

Current Roboflow path

Roboflow remains relevant as a parallel experiment path rather than a replacement for the local YOLO workflow.

The main operating rules are:

  • keep the Roboflow work in its own branch or worktree
  • do not assume Roboflow changes can recreate the shared CVAT stack safely by themselves
  • validate the shared CVAT invariants after any Roboflow-side deployment change
  • keep the public Roboflow-backed CVAT model clearly named so it is obvious when the output comes from Roboflow and not from a local YOLO model

Vast.ai operating role

Vast.ai is the preferred remote GPU training surface for YOLO work.

The practical reasons are:

  • it avoids depending on the Fedora mini PC for training
  • it avoids local AMD and Mac support complexity for the main run path
  • it supports repeatable GPU-backed runs for both segmentation and detection experiments

The current YOLO repo already contains the remote launch and sync scripts used to run those jobs. The safe mental model is:

  1. prepare or normalize the dataset locally
  2. sync the dataset and scripts to Vast.ai
  3. train remotely
  4. publish or inspect the run in W&B
  5. copy the best artifacts back locally
  6. only then consider whether the resulting model should be linked into CVAT

W&B operating role

W&B is the preferred dashboard and run-history surface for the remote YOLO path.

Use it to answer:

  • which run used which dataset
  • which model variant was trained
  • whether the smoke test and full baseline both succeeded
  • which checkpoint is best
  • whether a new run actually improved over the previous one

The local repo should still keep the critical artifacts and manifests, but W&B is the fastest way to understand what happened during a remote training run.

Practical health checks

Before treating the training-and-serving stack as healthy, confirm:

  1. https://cvat.adeelyj.com/api/server/about returns successfully
  2. https://cvat.adeelyj.com/api/lambda/functions returns successfully
  3. the relevant model appears in CVAT AI Tools
  4. the model can run on at least one frame without CSRF or host.docker.internal errors
  5. the current training repo, dataset manifest, and W&B run identity all agree on which model was actually trained

Naming rule for future models

Every new model should make three things obvious:

  1. family: YOLO or Roboflow
  2. task: Seg or Detect
  3. runtime identity: version and CPU/GPU serving assumption

Examples:

  • YOLO26s Drawing Seg v1 (CPU)
  • YOLO26s Drawing Detect v1 (CPU)
  • Roboflow Technical Drawing Public v1

Current interpretation

The important shift is that drawing-model work is no longer just "some local experiments."

It is now a real operational capability with:

  • a shared labeled-data surface in CVAT
  • a repeatable GPU training path on Vast.ai
  • a run-history layer in W&B
  • a shared CVAT-serving path through Nuclio
  • parallel YOLO and Roboflow model experiments that need to be coordinated rather than rediscovered

Combined-pass note

The current stack now also includes a multi-model combined pass for CVAT:

That page documents how the best-performing component models were combined into one merged detector pass and which model currently owns which target label.

Sources

  • C:\Users\adeel\OneDrive\100_Knowledge\203_TextCAD\01_Product_Project_Management\00_Project_Management_n_skills\04_playbooks\DRAWING_MODEL_TRAINING_AND_CVAT_MODEL_SERVING.md
  • D:\02_Code\49_yolotraining_firstdataset
  • D:\02_Code\50_CVAT_RoboFlow