Deployment Guide — DINO Detection

Mục lục 1. Kiến trúc hệ thống 2. Kết nối từ máy công ty 3. API Reference 4. Pipeline Auto-Labeling PPE 5. Script từng bước 6. Tham số tuning 7. Troubleshooting 8. Mở rộng: Face Detection Quick Reference Card

1. Kiến trúc hệ thống

┌──────────────────────────────────────────────────────────┐
│  Máy chủ (RTX 3090 24GB) — 100.73.4.34 (Tailscale)      │
│                                                          │
│  Port 9002: DINO Detection Service (Grounding DINO-base) │
│  Port 11434: llama-server (Qwen 27B, text model)        │
│  Port 11435: Ollama (llama3.2-vision VLM)               │
│                                                          │
└──────────────────────┬───────────────────────────────────┘
                       │ HTTPS proxy (svr12)
                       ▼
              trained.besen.vn
                       │
          ┌────────────┴────────────┐
          │  Internet / Tailscale   │
          └────────────┬────────────┘
                       │
              Máy công ty của bạn

Port	Service	Mô tả
9002	DINO Detection	API + Web Demo + Guide
11434	llama-server	Qwen 27B text (có thể dùng cho LLM reasoning)
11435	Ollama	llama3.2-vision VLM
22	SSH	Remote terminal

URL	Nội dung
`https://trained.besen.vn`	Web demo (upload ảnh → detect)
`https://trained.besen.vn/docs`	Swagger API docs
`https://trained.besen.vn/guide`	Trang này
`https://trained.besen.vn/api/health`	Health check

2. Kết nối từ máy công ty

Cách 1: Web Demo (đơn giản nhất)

Mở https://trained.besen.vn → kéo thả ảnh → Detect.

Phù hợp để test nhanh, xem kết quả, chỉnh tham số.

Cách 2: Python API

import requests, base64

BASE = "https://trained.besen.vn"

# Health check
print(requests.get(f"{BASE}/api/health").json())

# Detect 1 ảnh
with open("photo.jpg", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

resp = requests.post(f"{BASE}/api/detect", json={
    "image_base64": img_b64,
    "classes": ["person", "hard hat", "reflective vest", "safety shoes"],
    "box_threshold": 0.35, "text_threshold": 0.25
})
print(resp.json()["yolo_format"])

Cách 3: SSH vào máy chủ (cho batch job lớn)

# Cài Tailscale trên laptop công ty, login cùng account
# Sau khi VPN kết nối:
ssh leco@100.73.4.34

3. API Reference

`GET /api/health`

curl https://trained.besen.vn/api/health

{
  "status": "ok",
  "model": "IDEA-Research/grounding-dino-base",
  "device": "cuda",
  "model_loaded": true,
  "gpu_name": "NVIDIA GeForce RTX 3090",
  "vram_total_gb": 23.57
}

`POST /api/detect` — Detect 1 ảnh (base64)

curl -X POST https://trained.besen.vn/api/detect   -H "Content-Type: application/json"   -d "{"image_base64":"$(base64 -w0 photo.jpg)","classes":["person","hard hat"]}"

{
  "success": true,
  "detections": [
    {"bbox": [120, 45, 380, 560], "label": "person", "confidence": 0.92}
  ],
  "yolo_format": "0 0.130 0.280 0.068 0.239",
  "image_size": {"width": 1920, "height": 1080},
  "num_detections": 1,
  "inference_time_ms": 245.3
}

`POST /api/detect/file` — Upload file

curl -X POST https://trained.besen.vn/api/detect/file   -F "file=@photo.jpg"   -F "classes=person, hard hat, reflective vest, safety shoes"

`POST /api/batch` — Nhiều ảnh → labels ZIP

curl -X POST https://trained.besen.vn/api/batch   -F "images=@img1.jpg" -F "images=@img2.jpg"   -F "classes=person, hard hat, reflective vest, safety shoes"   -o labels.zip

`POST /api/attach_vlm` — Gắn VLM verifier

curl -X POST https://trained.besen.vn/api/attach_vlm   -F "base_url=http://127.0.0.1:11435"   -F "model=llama3.2-vision:11b"

4. Pipeline Auto-Labeling PPE → Train YOLO

Ảnh thô từ body cam
      │
      ▼ 1 Trích xuất (nếu video)
      │
      ▼ 2 DINO batch labeling → YOLO labels
      │
      ▼ 3 Lọc ảnh đạt chuẩn (confidence filter)
      │
      ▼ 4 Manual review
      │
      ▼ 5 Chốt dataset → train/val split
      │
      ▼ 6 Train YOLOv8 → yolov8m_ppe.pt
      │
      ▼ 7 Deploy model

Cấu trúc thư mục

/home/leco/ppe-pipeline/
├── raw_images/          ← Ảnh gốc
├── labels/              ← YOLO labels từ DINO
├── filtered/            ← Ảnh đạt chuẩn sau lọc
├── finalized/           ← Dataset đã review
│   ├── train/{images,labels}/
│   ├── val/{images,labels}/
│   └── data.yaml
├── batch_label.py       ← Script batch DINO
├── filter_dataset.py    ← Script lọc
└── train.py             ← Script train YOLO

5. Script từng bước

Bước 1: Chuẩn bị ảnh

mkdir -p /home/leco/ppe-pipeline/raw_images
cp /path/to/images/*.jpg /home/leco/ppe-pipeline/raw_images/
ls raw_images/ | wc -l   # đếm số ảnh

Bước 2: Batch labeling với DINO

Script batch_label.py gửi ảnh lên DINO API theo lô 30 ảnh, nhận về YOLO labels.

# Từ máy chủ (nhanh nhất):
cd /home/leco/ppe-pipeline
python3 batch_label.py raw_images labels

# Từ laptop công ty (qua internet):
DINO_API=https://trained.besen.vn BATCH_SIZE=10 python3 batch_label.py raw_images labels

Batch size 30 ảnh/lần qua localhost. Qua internet nên giảm xuống 10 để tránh timeout.

Bước 3: Lọc ảnh đạt chuẩn

Lọc các ảnh có ít nhất 2 PPE object + person.

cd /home/leco/ppe-pipeline
python3 filter_dataset.py raw_images labels filtered
# Output: filtered/images/ + filtered/labels/

Bước 4: Manual Review

Dùng Label Studio hoặc script OpenCV với phím tắt:

pip install label-studio
label-studio init ppe-review
label-studio start ppe-review
# Mở http://localhost:8080

Bước 5: Chốt dataset

mkdir -p finalized/{train,val}/{images,labels}

# Copy ảnh đã review vào (80% train, 20% val)

# finalized/data.yaml
path: /home/leco/ppe-pipeline/finalized
train: train/images
val: val/images
nc: 4
names:
  0: person
  1: hard_hat
  2: reflective_vest
  3: safety_shoes

Bước 6: Train YOLOv8

cd /home/leco/ppe-pipeline
pip install ultralytics
python3 train.py

Thời gian: ~2-4 giờ với 10k ảnh trên RTX 3090.

Bước 7: Inference

from ultralytics import YOLO

model = YOLO("runs/ppe/yolov8m_ppe/weights/best.pt")
results = model("test.jpg")
results[0].show()

# Batch
for r in model(["img1.jpg","img2.jpg"], stream=True):
    for box, cls in zip(r.boxes.xyxy, r.boxes.cls):
        print(f"Class {int(cls)}: {box.tolist()}")

6. Tham số tuning

DINO Detection

Tham số	Mặc định	Khi nào chỉnh
`box_threshold`	0.35	Thấp (0.2) → nhiều detect hơn. Cao (0.5) → chính xác hơn
`text_threshold`	0.25	Thấp (0.15) → match nhiều class. Cao (0.4) → chặt hơn

Điều kiện	box_threshold	text_threshold
Body cam ánh sáng kém	0.20	0.15
Ảnh rõ, ánh sáng tốt	0.35	0.25
Muốn ít false positives	0.45	0.30

YOLO Training

Tham số	Mặc định	Mô tả
`epochs`	100	Đủ cho 10k ảnh
`batch`	16	RTX 3090 có thể lên 32
`imgsz`	640	Lên 1280 nếu object nhỏ
`patience`	15	Early stopping

7. Troubleshooting

Server không phản hồi

curl https://trained.besen.vn/api/health
# Nếu lỗi → SSH vào restart:
ssh leco@100.73.4.34
cd /home/leco/dino && bash run.sh

Batch labeling timeout

# Giảm batch size
DINO_API=https://trained.besen.vn BATCH_SIZE=10 python3 batch_label.py raw_images labels

# Hoặc SSH vào chạy local (nhanh hơn)
ssh leco@100.73.4.34
cd /home/leco/ppe-pipeline
python3 batch_label.py raw_images labels

413 Content Too Large

from PIL import Image
img = Image.open("large.jpg")
img.thumbnail((1920, 1080))
img.save("resized.jpg", quality=85)

DINO detect ít object

Hạ threshold trong request JSON:

{"box_threshold": 0.2, "text_threshold": 0.15}

CUDA OOM khi train

# Sửa trong train.py: batch=8 hoặc imgsz=416

8. Mở rộng: Face Detection + Recognition

Sau khi YOLO-PPE ổn định:

Face Detection (YOLOv8-face, pretrained)

from ultralytics import YOLO
face_model = YOLO("yolov8n-face.pt")  # pretrained, không cần train
results = face_model("worker.jpg")

Face Recognition (ArcFace, pretrained)

pip install insightface onnxruntime

import insightface

recognizer = insightface.app.FaceAnalysis()
recognizer.prepare(ctx_id=0)  # GPU

faces = recognizer.get(img)
for face in faces:
    embedding = face.embedding  # 512-dim
    # So sánh với DB

Kiến trúc hoàn chỉnh

Frame body cam
    │
    ├──► YOLO-PPE → "hard hat: OK, vest: MISSING"
    └──► YOLO-face → ArcFace → "Nguyễn Văn A"
                │
                ▼
         Audit: "Nguyễn Văn A — VIOLATION: reflective vest"

Quick Reference Card

# === Từ laptop công ty ===

# Test API
curl https://trained.besen.vn/api/health

# Detect 1 ảnh
curl -X POST https://trained.besen.vn/api/detect   -H "Content-Type: application/json"   -d "{"image_base64":"$(base64 -w0 photo.jpg)","classes":["person","hard hat"]}"

# SSH vào máy chủ (nếu cài Tailscale)
ssh leco@100.73.4.34

# === Từ máy chủ (SSH) ===

# Restart DINO nếu lỗi
cd /home/leco/dino && bash run.sh

# Restart Ollama (cho VLM)
nohup env OLLAMA_HOST=0.0.0.0:11435 OLLAMA_MODELS=/home/leco/.ollama-models   /home/leco/ollama-install/bin/ollama serve > /tmp/ollama.log 2>&1 &

# Xem log
tail -f /tmp/dino-server3.log

# === Pipeline ===

cd /home/leco/ppe-pipeline
python3 batch_label.py raw_images labels    # 2. Label
python3 filter_dataset.py raw_images labels filtered  # 3. Filter
python3 train.py                             # 6. Train

Full guide: /home/leco/dino/GUIDE.md trên máy chủ. Push lên GitHub hoặc copy về để dùng offline.

DINO Deployment Guide