DINO Detection — PPE Auto-Labeling

Auto-Labeling PPE + Train YOLO

Dùng Grounding DINO để tự động gán nhãn object detection, xuất YOLO format, sau đó retrain YOLO cho inference siêu nhanh. Tất cả chạy local trên RTX 3090.

🎯

Grounding DINO

Zero-shot object detection. Đưa text prompt ("hard hat", "person") — nhận về bounding boxes. Không cần train, hiểu hàng trăm object class.

~700MB 1.4GB VRAM ~200ms/ảnh

🎭

SAM / SAM2

Segment Anything — vẽ mask chính xác pixel từ bounding box. SAM2 nhẹ hơn, hỗ trợ video tracking (nghiên cứu).

SAM: 358MB SAM2: 149MB ~1.5s/ảnh

🧠

Ollama VLM

Vision-Language Model: llama3.2-vision dùng để verify kết quả DINO, giảm false positives.

llama3.2-vision: 7.8GB

Pipeline Auto-Labeling → Train YOLO

Ảnh từ
body cam

→

DINO
batch label

→

Lọc ảnh
đạt chuẩn

→

Manual
review

→

Train
YOLOv8

→

Deploy
inference

🚀 Quick Test

Mở /demo → kéo ảnh vào → Detect

Chọn tab Segmentation để thử SAM/SAM2

📡 Remote API

# Health check
curl https://trained.besen.vn/api/health

# Detect
curl -X POST .../api/detect \
  -d '{"image_base64":"...","classes":["person","hard hat"]}'