Skip to content

feat(api): accept caller-supplied per-frame detections on /predict#47

Open
Chouffe wants to merge 13 commits into
mainfrom
arthur/feat-api-thread-bboxes
Open

feat(api): accept caller-supplied per-frame detections on /predict#47
Chouffe wants to merge 13 commits into
mainfrom
arthur/feat-api-thread-bboxes

Conversation

@Chouffe

@Chouffe Chouffe commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • The RPi edge devices already run a YOLO detector and hold per-frame smoke bboxes when the alert-api calls /predict; today the API re-runs its bundled YOLO over every frame. This adds an optional detections field to POST /predict so callers can supply those boxes and skip the in-API detector pass (~600 ms/request on CPU).
  • When detections is present, the bundled detector and its cache are bypassed entirely (no read, no write). The supplied boxes are converted to internal xywhn Detections and fed through the existing predict(frame_detections=...) injection seam. Tube building, ROI filtering, cropping, classification, and calibration run unchanged — the calibrator sees genuine per-tube mean_conf, log_len, and n_tubes from the real per-frame boxes. Core is untouched.
  • Spec: docs/specs/2026-06-11-api-supplied-detections-design.md. Documented (unvalidated) risk: calibration was fit on bundled-detector boxes; edge-detector boxes may shift it — to be validated at alert-api integration time.

The two paths

flowchart LR
    A["POST /predict"] --> B{"detections<br/>in request?"}
    B -- "no (today's behavior)" --> C["detection cache<br/>(read + write)"]
    C --> D["bundled YOLO<br/>on cache misses"]
    D --> E["tube building"]
    B -- "yes" --> F["convert xyxyn → xywhn<br/>(class_id=0, cache bypassed)"]
    F --> E
    E --> G["ROI filter<br/>(roi_xyxyn, optional)"]
    G --> H["crop + stabilize"]
    H --> I["ViT classifier"]
    I --> J["calibrator"]
    J --> K["{ is_smoke, probability }"]
Loading

Intended deployment flow:

sequenceDiagram
    participant RPi as RPi (pyro-engine)
    participant P as alert-api
    participant API as temporal-model API
    RPi->>P: alert + per-frame bboxes (xyxyn + conf)
    P->>API: POST /predict { frames, detections }
    Note over API: YOLO skipped — tubes built<br/>from the supplied boxes
    API-->>P: { is_smoke, probability }
Loading

Request format

Today's call — detector path (unchanged)

Omit detections (or send null) and the API behaves exactly as before: frames are fetched from S3 and the bundled YOLO runs on every frame (with the detection cache):

POST /predict
{
  "frames": [
    "seq9711/000_det134188_2026-06-01T14-13-19.164516Z.jpg",
    "seq9711/001_det134190_2026-06-01T14-17-21.300641Z.jpg",
    "seq9711/002_det134186_2026-06-01T14-17-22.351349Z.jpg"
  ],
  "bucket": "frames",
  "roi_xyxyn": [0.30, 0.35, 0.50, 0.55]
}

(bucket and roi_xyxyn optional, as before.) With ?verbose=true, the response reports "detections_source": "detector" and the detector stage shows up in profiling.

New call — caller-supplied detections (detector bypassed)

One entry per frame, index-aligned with frames. An explicit [] means "the detector ran on this frame and saw nothing" (becomes a gap for tube building); null entries and partial coverage are rejected.

POST /predict
{
  "frames": [
    "seq9711/000_det134188_2026-06-01T14-13-19.164516Z.jpg",
    "seq9711/001_det134190_2026-06-01T14-17-21.300641Z.jpg",
    "seq9711/002_det134186_2026-06-01T14-17-22.351349Z.jpg"
  ],
  "detections": [
    [
      { "xyxyn": [0.137, 0.437, 0.147, 0.454], "confidence": 0.41 },
      { "xyxyn": [0.369, 0.391, 0.431, 0.462], "confidence": 0.31 }
    ],
    [],
    [
      { "xyxyn": [0.371, 0.394, 0.434, 0.465], "confidence": 0.35 }
    ]
  ]
}

detections composes with bucket and roi_xyxyn; omitting it (or sending null) gives exactly today's behavior. The response shape is unchanged:

{
  "is_smoke": true,
  "probability": 0.952,
  "model": { "name": "vit_dinov2_finetune", "version": "0.1.0" }
}

With ?verbose=true, the details block now carries provenance — details.preprocessing.detections_source is "request" when the boxes came from the caller, "detector" when the bundled YOLO produced them.

Validation (400 invalid_request): length mismatch with frames, null/non-list entries, coords outside [0, 1], inverted or zero-area boxes (also catches accidental xywhn input fail-closed), confidence outside [0, 1], missing fields.

Relationship to #46

#46 explores the same goal (skip the in-API detector) with a simpler contract: one static bbox_xyxyn + one bbox_confidence stamped on every frame. That shape loses exactly the information the downstream stages need:

  • Tube building becomes a no-op. The same box on every frame trivially yields one full-length, gap-free tube. Real sequences have boxes that move/grow per frame, multiple simultaneous boxes, and frames with none — with a static box the crops don't track the smoke, which is off-distribution for the ViT (trained on crops following per-frame detector boxes).
  • The calibrator's features are all distorted. Its feature row is [logit, log_len, mean_conf, n_tubes] (core/logistic_calibrator.py:106). A forced single box pins mean_conf to one constant (defaulting to 1.0, far above real YOLO confidence distributions), log_len to the full sequence length, and n_tubes to 1 — so the returned probability comes from feature values the regressor never saw during fitting.
  • No "saw nothing" signal. Frames where the edge detector found nothing still get a fabricated detection.

This PR keeps the per-frame boxes and confidences instead, and the exact-equivalence check below shows that shape preserves the model's behavior bit-for-bit. The plumbing from #46 (injection seam, cache bypass, validation reuse) follows the same approach here.

Test Plan

  • make -C api lint && make -C api test — 150 passed, 1 skipped
  • make -C core test — 245 passed, core has zero diffs
  • Exact-equivalence check (MinIO + native uvicorn, CPU): ran the bundled detector outside the API via BboxTubeTemporalModel.detect(), converted its boxes to xyxyn, and sent them as detections — the response is identical to the in-API detector path (probability equal to all 16 digits, full verbose payload deep-equal; only detections_source and timings differ)
  • Local e2e on scratch/annot_seq_9711 (7 real frames):
    • detector path: is_smoke=true, p=0.870, 3 tubes, detector stage 612 ms, detections_source="detector"
    • supplied detections from the sequence's label files (per-frame box counts 3/0/2/4/1/3/0): is_smoke=true, p=0.952, 2 tubes, no detector stage in profiling, detections_source="request"
    • all-empty detections: is_smoke=false, p=0.0, 0 tubes
    • 19 malformed-payload variants all return 400 invalid_request; detections: null falls back to the detector path
    • detector-path request after a supplied-detections request still gets 7/7 cache hits — supplied boxes never entered the cache

Chouffe added 2 commits June 12, 2026 08:51
…d-bboxes

# Conflicts:
#	api/README.md
#	api/src/temporal_model/api/app.py
#	api/src/temporal_model/api/model_runner.py
#	api/src/temporal_model/api/schemas.py
#	api/tests/test_app.py
#	api/tests/test_model_runner.py
#	api/tests/test_schemas.py
The merge of #47 with the compute_trigger flag (#51) left the
supplied-detections fast path dropping the flag: ?compute_trigger=true
with caller-supplied boxes would silently skip the first-crossing
search. Thread it through and pin the composition with a test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant