Skip to content

Format streaming#21

Merged
caseymcc merged 4 commits into
mainfrom
format_streaming
Jun 11, 2026
Merged

Format streaming#21
caseymcc merged 4 commits into
mainfrom
format_streaming

Conversation

@caseymcc

Copy link
Copy Markdown
Owner

No description provided.

caseymcc and others added 4 commits May 23, 2026 13:12
Co-authored-by: Copilot <copilot@github.com>
- InferenceScheduler: tokenizer thread + per-accelerator worker threads,
  job tracking with cancellation on client disconnect, TokenChannel for
  streaming tokens back to HTTP threads
- server: scheduler-backed /v1/chat/completions for local models with SSE
  status keepalives (queue position), heartbeats for non-streaming, proper
  HTTP status for fast failures, real token usage in include_usage chunk
- /api/scheduler/jobs endpoint; dashboard shows active jobs and cancelled
  status; scheduler shutdown on server exit
- llama provider: abort checks during prompt processing and generation
- telemetry: job id and cancelled flag on inference stats
- config: harmony api_format for gpt-oss-120b (submodule)
- tests: TokenChannel and InferenceScheduler coverage; fix flaky
  configDownloader test (libgit2 init, daemon readiness poll, cleanup)
- docs: scheduler endpoint and local model architecture

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@caseymcc caseymcc merged commit 7a364b6 into main Jun 11, 2026
1 check passed
@caseymcc caseymcc deleted the format_streaming branch June 11, 2026 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant