完善 LLM 流式运行的 actor 录制与验收覆盖#2289
Conversation
🤖 架构复核通过:typed actor-owned recorder 路径未引入新架构债TL;DR
架构证据
📎 完整 codex 原始输出(存档备查)pr: 2289
|
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## feature/integrate #2289 +/- ##
=====================================================
- Coverage 84.08% 84.00% -0.08%
=====================================================
Files 1139 1197 +58
Lines 82320 91202 +8882
Branches 10705 11935 +1230
=====================================================
+ Hits 69219 76617 +7398
- Misses 8414 9296 +882
- Partials 4687 5289 +602
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 180 files with indirect coverage changes 🚀 New features to boost your workflow:
|
🤖 架构复核:PR 2289 暂不建议合并TL;DR
架构阻塞点
建议修复方向:actor turn 可以记录 accepted/run-start fact 并发布最小内部信号;如果确实需要独立 executor,应该建模为明确的 run/session-scoped actor 或 durable runtime component,让 sequence/watermark、retry、cancel、terminal 由 actor state 或另一份明确分布式权威状态承载。 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: architect
head_sha: 4a9d4e0bbb5715859499c3c2ad13168a53a80c90
verdict: reject
---
## Verdict
reject - PR moves the live LLM/tool run progression into a `Task.Run` callback outside the actor event turn, which regresses CLAUDE's actor execution model.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:326`: after recording `LlmRunStartedEvent`, the actor starts `_ = Task.Run(() => ConsumeRunOffActorTurnAsync(...))`; `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:961` then uses `SelfDispatchingLlmRunSink` with process-local `_lastSequence` to turn streamed chunks/tool/terminal results back into actor messages. This is more than a callback signal: it consumes the provider stream and advances the run/tool loop outside the actor event handler, then relies on a process-local sequence counter for run fact ordering. CLAUDE clauses violated: "回调只发信号:`Task.Run`/`Timer`/线程池回调不直接读写运行态或推进业务;只发布内部触发事件。" and "业务推进内聚:工作流推进(成功/失败/分支/重试)在 Actor 事件处理流程内完成,保证顺序性与可重放性。" It also conflicts with FI-004: "跨进程、跨 turn 或跨节点的事实必须有权威记录;进程内记忆、cache、临时变量不能冒充事实源。"
## What would change your verdict
Move the long-lived LLM run execution back behind an actor-owned/runtime-neutral continuation contract: the actor turn may persist accepted/run-start facts and publish minimal internal signals, but the live provider/tool progression must not be owned by an untracked `Task.Run` callback with process-local sequencing. If a separate executor is required, make it an explicit run/session-scoped actor or durable runtime component whose facts, sequence/watermark, retry, cancellation, and terminal behavior are owned by actor state or another documented distributed authority.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:reject⟦AI:AUTO-LOOP⟧ |
🤖 Quality review 通过TL;DR
质量结论本轮只看 quality 维度,不评估架构条款、测试充分性或性能。 我按 PR head 打开了触达文件,而不是只看 diff hunk。主要观察点如下:
📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: quality
head_sha: 4a9d4e0bbb5715859499c3c2ad13168a53a80c90
verdict: approve
---
## Verdict
Approve: the change is readable and focused on typed LLM run recording/acceptance semantics, with no objective naming, dead-code, over-abstraction, complexity, or refactor self-doc policy issues found.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:277` keeps the run request handler under the preferred size threshold and separates off-turn stream consumption from typed recorder event handling through small, named helpers.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:337` adds reachable event handlers for each typed recorder fact; the handlers share one acceptance predicate instead of duplicating the run identity/sequence checks.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:809` expresses the key business rule directly as `CanPersistRunFact`, and terminal sequence handling is isolated in `CanAcceptRunTerminalSequence` at line 780.
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:12` centralizes acceptance-test fixtures and fake providers used by the new acceptance/executor tests; no unused new public test support surface was found in the touched diff.
- `${HOST_REFACTOR_COMMENT_POLICY}` is empty and therefore normalizes to `none`; the PR diff adds no new `Refactor`/`Old pattern`/`New principle` self-doc comments.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:approve⟦AI:AUTO-LOOP⟧ |
🤖 Tests review: reject,off-turn recorder path 缺少生产链路覆盖TL;DR
测试覆盖问题
必须修的缺口有 2 个:
我没有发现这组测试新增 建议补一个 actor 级测试:保留 recording 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: tests
head_sha: 4a9d4e0bbb5715859499c3c2ad13168a53a80c90
verdict: reject
---
## Verdict
Reject: the new off-turn actor recording path has a real coverage gap around the production self-dispatch sink and forwarded-tool recorder handler.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:327` now starts the LLM run on `Task.Run` and routes facts through the private production `SelfDispatchingLlmRunSink` at `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:961`, but `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:768` only proves the actor returns after admission and the provider stream starts. It never observes the registered `IActorDispatchPort` after `releaseStream.SetResult()`, so a regression where the production sink fails to dispatch `LlmStreamChunkObserved` / `LlmRunCompleted` back to the actor would still pass.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:387` adds the public `HandleLlmSessionForwardedToolCallEmittedAsync` recorder handler for self-dispatched forwarded tool calls. Existing forwarded-tool assertions at `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:915` cover the old in-actor sink path, while `test/Aevatar.GAgentService.Tests/Application/LlmRunExecutorTests.cs:16` covers only the test-support `DispatchingLlmRunSink`; neither exercises the new production handler path used when `IActorDispatchPort` is present.
- I found no added `[Skip]`, manual trait, `Task.Delay`, `WaitUntilAsync`, or polling allowlist churn in the touched tests.
## What would change your verdict
Add an actor-level test with a retained recording `IActorDispatchPort` that releases the gated provider, waits deterministically for the background run to dispatch, feeds the dispatched envelopes through the actor, and asserts the final actor state/completion. Include the forwarded-tool case so `HandleLlmSessionForwardedToolCallEmittedAsync` is exercised through the off-turn path, not only through a test-support sink.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:reject⟦AI:AUTO-LOOP⟧ |
🤖 tests reviewer:拒绝,off-actor 录制路径缺少真实覆盖TL;DR
测试覆盖结论
我没有发现本次测试改动新增 需要把现有 gated provider 测试改成保留 📎 完整 codex 原始输出(存档备查)pr: 2289
|
🤖 质量复审通过TL;DR
质量复审结论本轮只看代码质量角度,不评 CLAUDE 架构条款和测试覆盖充分性。 我检查了三点:
📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: quality
head_sha: 4a9d4e0bbb5715859499c3c2ad13168a53a80c90
verdict: approve
---
## Verdict
approve - the PR stays readable and focused for the LLM stream recording acceptance work, with no quality-axis blocker found.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:337` adds typed recorder handlers (`HandleLlmStreamChunkObservedAsync`, `HandleLlmToolCallObservedAsync`, terminal handlers) whose names express the business facts being recorded rather than generic helper/manager wording.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:809` keeps the run-fact acceptance logic in a small, named predicate (`CanPersistRunFact`) instead of spreading sequence/terminal checks across each handler.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:961` introduces `SelfDispatchingLlmRunSink` as a private nested implementation with a single production caller at `LlmSessionGAgent.cs:331`; the abstraction surface is not expanded.
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:12` centralizes the new acceptance fixtures and helper sinks used by the new tests, and the new helper types have direct callers in `LlmRunCoreTests`, `LlmRunExecutorTests`, and `LlmRunEndToEndAcceptanceTests`.
- `docs/canon/llm-streaming.md:38` and `docs/canon/nyxid-responses-direct.md:120` document the stream execution boundary in the existing canon docs, with no unrelated drive-by cleanup in the inspected diff.
- Refactor comment policy: `${HOST_REFACTOR_COMMENT_POLICY}` was empty and therefore normalized to `none`; the diff adds no new `Refactor` / `Old pattern` / `New principle` source-code comments or hardcoded host placeholder text.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:approve⟦AI:AUTO-LOOP⟧ |
🤖 PR 2289 round 1 fix 已完成TL;DR
修复摘要
文档也同步收口: 验证结果: 📎 完整 codex 原始输出(存档备查)# Fix report for PR 2289 round 1
## Applied
- (A) `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:316`: removed the untracked `Task.Run` off-turn execution path and deleted the process-local `SelfDispatchingLlmRunSink` sequence counter; `HandleLlmRunRequestedAsync` now runs `ILlmRunCore` through the actor-owned `InActorLlmRunSink` in the command turn. This addresses reviewer:architect's evidence #1 and the cited CLAUDE clauses: `Task.Run` callbacks only publish signals, business progression stays in Actor event handling, and cross-turn facts cannot rely on process-local memory.
- (A) `docs/canon/llm-streaming.md:42` and `docs/canon/nyxid-responses-direct.md:126`: updated the canon docs so they no longer authorize off-turn provider stream ownership; the docs now state that future extraction requires a run/session-scoped actor or durable runtime component with owned sequence/watermark, retry, cancellation, and terminal facts. This addresses reviewer:architect's "What would change your verdict" request for a runtime-neutral actor-owned/durable contract.
- (A) `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:921`: added direct actor-level coverage for `HandleLlmSessionForwardedToolCallEmittedAsync`, including duplicate idempotency. This addresses reviewer:tests's evidence #2 for the forwarded-tool recorder handler that remains in the fixed design.
- (A) `test/Aevatar.GAgentService.Tests/Application/LlmRunExecutorTests.cs`: deleted the test file that covered only the now-removed off-turn dispatch sink; `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs` was trimmed accordingly. This is part of the architect fix and avoids preserving coverage for an implementation that CLAUDE rejects.
## Rejected as false positive
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:327` cited by reviewer:tests: the demand to add a retained recording `IActorDispatchPort` test for the production self-dispatch sink is superseded by the applied architecture fix. Evidence: `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:316` now calls `runCore.RunAsync(..., new InActorLlmRunSink(this), ...)`; `rg "Task\.Run|SelfDispatchingLlmRunSink|Interlocked\.Increment|_lastSequence|ConsumeRunOffActorTurnAsync|Off-turn LLM" src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs test/Aevatar.GAgentService.Tests/Application test/Aevatar.GAgentService.Tests/Core test/Aevatar.GAgentService.Tests/TestSupport` returns no matches. The former test-support dispatch sink was removed with `test/Aevatar.GAgentService.Tests/Application/LlmRunExecutorTests.cs`.
## Blocked (cannot fix this round)
- None.
## Build status
- build: pass (`dotnet build aevatar.slnx --nologo`; warnings only)
- tests: pass (`dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --nologo`; 907 passed, 0 failed, 0 skipped)
- guards: pass (`bash tools/ci/test_stability_guards.sh`)
- review-thread completion: not applicable; `/Users/chronoai/Code/aevatar/.refactor-loop/state/review-thread-completion/pr2289.json` does not exist.
- refactor self-doc: not applicable (`HOST_REFACTOR_COMMENT_POLICY=none`); no new source refactor-history comments were added.
## Recommendation for next round
- Expect merge after rerun: reject evidence has been resolved, one tests demand was made false-positive by removing the rejected production path, and there are no blocked demands.
⟦AI:AUTO-LOOP⟧
FIX_DONE:2289:round-1:applied-4:rejected-1:blocked-0⟦AI:AUTO-LOOP⟧ |
🤖 质量审查:comment,建议删掉两个未使用测试 helperTL;DR
质量审查意见我没有看到需要阻塞合并的生产代码复杂度、命名或新增抽象问题; 发现两个小的可读性问题,都是测试 fixture(测试里复用的构造数据/helper)留下了未使用入口:
建议处理方式:删除 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: quality
head_sha: b4a133d60c545e048152bdccd1743b9251bbf377
verdict: comment
---
## Verdict
Comment: the production change is readable and focused, but the new test support leaves a couple of unused helper surfaces that should be trimmed or used.
## Evidence
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18`: `BuildRunRequest()` is a new public helper on the test harness, but there is no caller in the PR head. Unused fixture builders make later tests guess which canonical setup path is intended.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:97`: `CreateActor(..., string? responseId = null)` has a `responseId` branch, but every caller passes only the event store. That optional parameter is an unused extension point and adds unnecessary branching to a test helper.
## What would change your verdict (only if comment or reject)
Delete `LlmRunAcceptanceHarness.BuildRunRequest()` if it is not needed, and remove the unused `responseId` parameter/branch from `CreateActor`, or update the tests to use both intentionally.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment⟦AI:AUTO-LOOP⟧ |
🤖 tests 视角通过:流式运行录制覆盖足够TL;DR
测试覆盖结论本轮只检查测试质量,不评审生产架构。 我确认新增测试不是 bump-line-count:
同时扫描 PR diff:没有新增 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: tests
head_sha: b4a133d60c545e048152bdccd1743b9251bbf377
verdict: approve
---
## Verdict
Approve: test coverage and quality are adequate for the LLM stream-run actor recording and acceptance changes.
## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:159` (`RunAsync_WhenCancellationTokenIsCancelled_ShouldFlushCancelledFactThroughSink`) asserts the cancelled fact is still recorded with a non-cancelled sink token, directly covering the `LlmRunCore.RunAsync` cancellation-token behavior changed in `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:35`.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:179` (`RunAsync_WhenProviderNeverEmitsTerminalChunk_ShouldRecordCancelledFactWhenCancelled`) uses a `TaskCompletionSource` and cancellable wait, not sleep/poll pacing, and asserts partial stream preservation plus cancelled terminal output.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:14` and `:37` assert concrete actor state outcomes for duplicate chunks, late chunks, duplicate terminal events, failed-after-completed, and completed-after-cancelled paths, covering the new recorder command handlers and terminal-authority branches in `LlmSessionGAgent`.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:57` verifies replay accepts a terminal event despite a missing intermediate recorder sequence and rejects later chunks, covering the relaxed terminal sequence acceptance rule.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:768` and `:921` cover duplicate running run admission and forwarded tool-call recorder idempotency with state-version assertions, not mock-only verification.
- Diff scan found no added `[Skip]`, manual traits, `Task.Delay`, `WaitUntilAsync`, `Assert.True(true)`, or assertion weakening. The polling allowlist was unchanged, so no new stability exception needs justification.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve⟦AI:AUTO-LOOP⟧ |
🤖 架构评审 r3:approveTL;DR
架构检查摘要我按实际三点 diff 关键 evidence:
未发现 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: architect
head_sha: b4a133d60c545e048152bdccd1743b9251bbf377
verdict: approve
---
## Verdict
approve: no architectural concerns from the CLAUDE.md/AGENTS.md compliance angle; the PR keeps LLM run recording actor-owned and typed.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:322`-`370` adds typed recorder event handlers for stream chunks, tool calls, completion, failure, and cancellation; this maps to `CLAUDE.md` "核心语义强类型:影响业务语义、控制流、稳定读取且仓库内可控的数据,必须建模为 `proto field / typed option / typed sub-message`,禁止塞入通用 bag。" and does not introduce a generic bag or JSON state path.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:775`-`797` gates recorder facts by `responseId + runId + sequence` inside the session actor, and `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:819`-`870` persists the accepted facts through actor domain events. This aligns with `CLAUDE.md` "单一权威拥有者:每个稳定业务事实有唯一 actor 拥有;`committed event store + actor state` 是唯一真相,readmodel 只是查询副本。"
- `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:31`-`43` preserves cancellation as an explicit `LlmRunCancelled` fact and only changes the sink token to `CancellationToken.None`, which supports `CLAUDE.md` "AI 对话主链必须流式化:实时会话入口必须使用 `ChatStreamAsync`" without adding `ChatAsync`, host-side response synthesis, or query-time replay.
- `docs/canon/llm-streaming.md:38`-`50` and `docs/canon/nyxid-responses-direct.md:120`-`133` document why the live provider stream is not represented as resumable self-continuation state and why any future split must be a run/session-scoped actor or durable runtime component. This is consistent with `AGENTS.md` "self continuation 必须事件化" and "本地可用不等于分布式正确" rather than adding a `Task.Run` callback or process-local registry.
- Host-specific inputs: `HOST_REFACTOR_COMMENT_POLICY` was empty, so it normalizes to `none`; the diff adds no new Old/New refactor-history source comments. `HOST_ARCHITECTURE_GREP_CHECKS`, `HOST_PROTO_POLICY`, `AUDIT_PATH`, `IMPLEMENT_SUMMARY_PATH`, and `EXTERNAL_REPOS` were empty, so no extra host checks applied. The diff contains no `.proto` changes, no new `*WriteActor` / `*ReadActor` / `*Store` split, no `.refactor-loop` production configuration movement, and no external repository changes.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve⟦AI:AUTO-LOOP⟧ |
🤖 架构评审结论:reject,需拆开 recorder command 与 committed eventTL;DR
架构阻断点
问题是这些类型本身也是 committed domain event,即 actor 已提交后用于状态转换和投影观察的事实。它们在 建议改法:为入站 recorder 建独立 typed command/internal-signal,例如 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: architect
head_sha: b4a133d60c545e048152bdccd1743b9251bbf377
verdict: reject
---
## Verdict
reject: the PR adds recorder `EventHandler` entry points that reuse committed run event payloads as inbound actor commands/internal signals, which regresses the command/event semantic boundary.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:322` adds `HandleLlmStreamChunkObservedAsync(LlmStreamChunkObserved observed)`, and lines `332`, `342`, `352`, `362`, and `372` add equivalent handlers for `LlmToolCallObserved`, `LlmRunCompleted`, `LlmRunFailed`, `LlmRunCancelled`, and `LlmSessionForwardedToolCallEmittedEvent`. Those handlers then call `PersistStreamChunkObservedAsync` / `PersistRunCompletedAsync` / related persist methods, while the same payload types are registered as state-transition domain events at `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:408` through `413`. This makes one contract serve both as inbound command/internal signal and committed event. CLAUDE.md clause: "读写分离:`Command -> Event`,`Query -> ReadModel`;异步完成通过事件通知,不在会话内拼装流程。"
- The same change also weakens the envelope semantic boundary: the actor can now receive an `EventEnvelope` carrying a payload such as `LlmRunCompleted` and persist that same payload as the committed event, without a distinct message contract defining whether the input is a command, internal signal, or already-committed event. CLAUDE.md clause: "`Envelope` 是统一消息包络(`command/reply/signal/event/query`),但是否可持久化、可投影、可观察必须由消息契约显式定义。"
## What would change your verdict
Introduce distinct typed recorder command/internal-signal messages for the inbound side, for example `RecordLlmStreamChunkObservedRequested`, `RecordLlmRunCompletedRequested`, etc., and have `LlmSessionGAgent` translate those into committed `LlmStreamChunkObserved` / `LlmRunCompleted` domain events. Alternatively, remove the inbound recorder handlers if only the actor-owned `ILlmRunSink` path is intended. Update the acceptance tests and docs to preserve that command/event split.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:reject⟦AI:AUTO-LOOP⟧ |
🤖 质量复审:comment,建议清理 3 处小问题TL;DR
质量发现
📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: quality
head_sha: b4a133d60c545e048152bdccd1743b9251bbf377
verdict: comment
---
## Verdict
Comment: the production changes are readable and focused, but the new test support leaves a couple of small unused helper surfaces that should be trimmed.
## Evidence
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18` adds `BuildRunRequest()`, but `git grep 'LlmRunAcceptanceHarness.BuildRunRequest' origin/refactor/iter2276-issue-2276` finds no callers. This is dead test helper surface; remove it unless a test should use it.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:97` gives `CreateActor` an optional `responseId` parameter, but all callers use the default path. The non-null branch at line 102 is dead readability overhead; drop the parameter/branch or add the scenario that needs a different response id.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:842` names the `conflict` lambda parameter even though the lambda body does not read it; the same pattern appears at lines 855 and 868. Use `_ => ...` so the required delegate parameter is visibly ignored.
## What would change your verdict (only if comment or reject)
Remove `LlmRunAcceptanceHarness.BuildRunRequest()`, simplify `CreateActor` to the actually used actor id path, and rename the unused conflict lambda parameters to `_`.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment⟦AI:AUTO-LOOP⟧ |
🤖 PR 2289 测试审查通过TL;DR
测试审查结论本轮只看测试角度。 关键证据:
我也扫了测试 diff: 没有新增 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: tests
head_sha: b4a133d60c545e048152bdccd1743b9251bbf377
verdict: approve
---
## Verdict
Approve: the PR adds meaningful behavior coverage for the LLM stream-run actor recorder paths and avoids the test-quality regressions on the checklist.
## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:159` (`RunAsync_WhenCancellationTokenIsCancelled_ShouldFlushCancelledFactThroughSink`) asserts the cancelled fact is recorded through a non-cancelled sink token, directly covering the `LlmRunCore.RunAsync` cancellation flush change in `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs`.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:179` (`RunAsync_WhenProviderNeverEmitsTerminalChunk_ShouldRecordCancelledFactWhenCancelled`) uses a `TaskCompletionSource` plus cancellable wait rather than fixed sleep/poll pacing, and asserts partial stream recording plus cancelled terminal output.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:14` and `:37` assert concrete actor state outcomes for duplicate chunks, late chunks, duplicate terminal facts, failed-after-completed, and completed-after-cancelled paths, covering the new recorder command handlers and terminal authority behavior.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:57` verifies replay accepts a terminal event despite a missing intermediate recorder sequence and rejects a later chunk, covering the relaxed terminal sequence acceptance rule.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:719` verifies `LlmRunStartedEvent` is persisted before chunk/completion facts with exact sequence assertions; `:768` verifies duplicate running-run admission is an idempotent no-op; `:921` verifies forwarded tool-call recorder idempotency and persisted call fields.
- Diff scan found no added `[Skip]`, manual traits, `Task.Delay`, `WaitUntilAsync`, `Assert.True(true)`, bump-only tests, or assertion weakening. `tools/ci/test_polling_allowlist.txt` was unchanged, so no new polling exception needs justification.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve⟦AI:AUTO-LOOP⟧ |
📊 状态卡片 — fix 派出
🤖 controller status banner ⟦AI:AUTO-LOOP⟧ |
📊 状态卡片 — fix 派出
🤖 controller status banner ⟦AI:AUTO-LOOP⟧ |
…/iter2276-issue-2276 # Conflicts: # src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs # test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs
🤖 quality review: reject,需删除新增死代码TL;DR
质量发现
需要的改动很小:删除 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: quality
head_sha: 934ed6ccad6de290b1751d017c33e822c49e9437
verdict: reject
---
## Verdict
Reject: the PR is mostly focused and readable, but it introduces unused private production helpers that should not merge as dead code.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1236` defines `CompletionAlreadyRecorded(LlmRunCompleted completed)`, but `git grep` over the PR head finds no caller; the helper duplicates terminal/idempotency intent already expressed by `CanPersistRunFact` and the run transition path.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1253` defines `RunTerminalAlreadyRecorded(string runId)`, but `git grep` over the PR head finds no caller; it is new dead production code in a class that is already large enough that unused helpers make the run-recording flow harder to audit.
## What would change your verdict (only if comment or reject)
Delete `CompletionAlreadyRecorded` and `RunTerminalAlreadyRecorded`, or wire them into the actual terminal/idempotency decision path with tests that exercise those branches.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:reject⟦AI:AUTO-LOOP⟧ |
🤖 Tests review: approveTL;DR
测试覆盖结论本轮只看测试质量,不评价生产架构。结论是测试足够支撑这次 diff:新增测试不是单纯 bump 行数,而是在断言具体业务结果。 关键证据:
我也扫描了 touched tests:没有新增 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: tests
head_sha: 934ed6ccad6de290b1751d017c33e822c49e9437
verdict: approve
---
## Verdict
Test coverage and quality are adequate for the LLM streaming actor recording and acceptance diff.
## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:424` covers the cancellation-regression behavior from `LlmRunCore.RunAsync`: the cancelled fact is flushed through the sink with a non-cancelled sink token, and completed/failed facts remain empty.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:444` uses a deterministic `TaskCompletionSource` and `WaitAsync(ct)` harness to verify a never-terminal stream records partial output and then a cancelled fact; no sleep/delay pacing is introduced.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:14` and `:37` assert terminal completion/cancellation authority across duplicate, late, and conflicting recorder facts by checking state version, session status, active run status/output/sequence, and completion fields.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:57` covers replay behavior for a terminal run event with missing intermediate sequence and a later stale chunk, asserting the terminal state and applied version remain authoritative.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:802` covers duplicate active-run admission by asserting no new state version is persisted.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1222` covers the new forwarded tool-call recorder path and duplicate suppression with concrete call fields and status assertions.
- Branch scan over the touched tests found no new `Task.Delay`, `Thread.Sleep`, `WaitUntilAsync`, `[Skip]`, manual trait, `Assert.True(true)`, or new not-throw-only tests; `tools/ci/test_polling_allowlist.txt` is unchanged.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve⟦AI:AUTO-LOOP⟧ |
🤖 架构评审通过TL;DR
架构结论本轮只审架构合规,不评价测试覆盖质量或性能。核心判断是:新增的 typed recorder fact 指
检查到的关键证据:
我也做了针对性 grep:未发现新增 Old/New refactor-history 源码注释、 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: architect
head_sha: 934ed6ccad6de290b1751d017c33e822c49e9437
verdict: approve
---
## Verdict
approve - no architecture compliance blocker found for the typed LLM run recorder/finalizer changes.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583` adds direct handlers for existing typed run facts (`LlmStreamChunkObserved`, `LlmToolCallObserved`, `LlmRunCompleted`, `LlmRunFailed`, `LlmRunCancelled`) on the same `LlmSessionGAgent` authority. This is consistent with CLAUDE.md "Actor 即业务实体:一个 actor = 一个业务实体(数据与方法同住);禁止按技术功能(读/写/投影)拆分同一业务实体为多个 actor。"
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1056` gates persistence by `responseId + runId + sequence` and terminal status, so the PR does not introduce process-local registries or middle-layer ID maps. This is consistent with CLAUDE.md "事实源唯一:跨请求/跨节点一致性事实必须有唯一权威来源(Actor 持久态或分布式状态),不依赖进程内偶然状态。"
- `src/platform/Aevatar.GAgentService.Abstractions/Protos/llm_sessions.proto:278` already models run facts as protobuf messages with typed `sequence` and `record_id`; the diff does not add schema bags or JSON/XML state storage. This is consistent with CLAUDE.md "核心语义强类型:影响业务语义、控制流、稳定读取且仓库内可控的数据,必须建模为 `proto field / typed option / typed sub-message`,禁止塞入通用 bag。"
- `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37` records cancellation with `CancellationToken.None`, preserving a committed terminal fact instead of letting caller cancellation suppress the actor-owned final state. This is consistent with CLAUDE.md "committed domain event 必须可观察:write-side 完成 committed event 后必须送入 projection 主链;禁止只落 event store 而不进入可观察流。"
- `docs/canon/llm-streaming.md:27` and `docs/canon/nyxid-responses-direct.md:77` update canonical documentation for the actor-owned run recording boundary and explicitly reject self-continuation or `Task.Run` ownership of the live provider stream. This satisfies CLAUDE.md "架构调整 PR 必须同步相关 `docs/`,并在验证结果中列出 build/test/guard。"
- Scope and host-boundary checks passed from the diff: changed files are limited to LLM run source/tests/canon docs; no `.refactor-loop/host.env` or production host facts were introduced; no new external repo references or `*WriteActor` / `*ReadActor` / `*StoreActor` split appeared; no new Old/New refactor-history source comments appeared while `HOST_REFACTOR_COMMENT_POLICY` is empty/default `none`.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve⟦AI:AUTO-LOOP⟧ |
🤖 测试视角通过:LLM 流式 actor 录制覆盖充分TL;DR
测试覆盖证据本轮只看测试质量和覆盖,不评价生产架构。
📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: tests
head_sha: 934ed6ccad6de290b1751d017c33e822c49e9437
verdict: approve
---
## Verdict
Approve: the diff adds behavior-level tests for the new LLM run recorder paths, terminal idempotency, cancellation flushing, and duplicate/late event handling, with no added skip/manual/polling-test shortcuts.
## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:15` asserts a completed run remains authoritative across duplicate chunks, late chunks, duplicate completion, and late failure; this covers the mailbox recorder path for `LlmStreamChunkObserved`, `LlmToolCallObserved`, and `LlmRunCompleted`.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:37` asserts cancellation remains authoritative and ignores later chunk/completion/cancel flushes; this covers terminal cancellation behavior instead of just checking that handlers return.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:57` asserts replay accepts a terminal event despite a missing intermediate recorder command while rejecting the later stale chunk, covering the terminal sequence exception.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:424` asserts cancelled runs are flushed through the sink with `CancellationToken.None`, directly covering the `LlmRunCore.RunAsync` cancellation-token change.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:445` asserts a never-terminal provider records a partial chunk and then a cancelled fact after cancellation, using `TaskCompletionSource` plus cancellation rather than sleep-based pacing.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:803` asserts duplicate admission of an already-running run is ignored without changing `LastAppliedEventVersion`, covering the new same-active-run guard.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:921` and `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1223` assert actor-owned sequence/idempotency and forwarded tool-call duplicate handling with real actor state, not mock-only verification.
- Hygiene scan over the touched files found no added `[Skip]`, `[Trait("Category", "Manual")]`, `Task.Delay`, `Thread.Sleep`, `WaitUntilAsync`, or `Assert.True(true)` patterns; the only `Should().NotThrowAsync` match is pre-existing at `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:99` and is outside this PR diff.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve⟦AI:AUTO-LOOP⟧ |
🤖 Quality review: reject,需删除未使用的 actor helperTL;DR
具体问题
建议处理方式: 如果这两个 helper 是前一版方案遗留,直接删除;如果确实是目标设计的一部分,请把它们接入 run fact acceptance path,并补上覆盖该分支的测试。 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: quality
head_sha: 934ed6ccad6de290b1751d017c33e822c49e9437
verdict: reject
---
## Verdict
Reject: the PR introduces production dead code in the central LLM session actor.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1236` adds private helper `CompletionAlreadyRecorded(LlmRunCompleted completed)`, but `git grep` over the PR branch finds no caller outside the declaration. It duplicates terminal/completion acceptance intent without participating in the actual flow, making the run-finalization logic harder to read.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1253` adds private helper `RunTerminalAlreadyRecorded(string runId)`, but `git grep` over the PR branch finds no caller outside the declaration. This is another unused terminal-state helper next to the live persistence path.
## What would change your verdict (only if comment or reject)
Delete `CompletionAlreadyRecorded` and `RunTerminalAlreadyRecorded`, or wire them into the run fact acceptance path with tests showing the intended branch. If deletion is correct, no behavior change is needed.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:reject⟦AI:AUTO-LOOP⟧ |
🤖 架构审查通过:未发现 CLAUDE/AGENTS 回归TL;DR
架构证据
本轮 host policy 变量为空:Old/New source comment policy 归一化为 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: architect
head_sha: 934ed6ccad6de290b1751d017c33e822c49e9437
verdict: approve
---
## Verdict
approve - no architecture compliance regression found in the changed C# / proto review surface or the related canon documentation updates.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583` keeps recorder facts inside the existing `LlmSessionGAgent` owner and persists them through actor-owned methods, matching `CLAUDE.md` clauses "Actor 即业务实体:一个 actor = 一个业务实体" and "单一权威拥有者:每个稳定业务事实有唯一 actor 拥有".
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1056` gates incoming run facts by registered session, terminal state, run id, and sequence before persistence; no new service-level `Dictionary<>` / `ConcurrentDictionary<>` fact map or read/write actor split was introduced.
- `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:101` continues to use `ChatStreamAsync` for the live LLM path, and `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37` records cancellation as typed `LlmRunCancelled` even after the provider token is cancelled, aligning with the AGENTS rule that AI conversation main paths must be stream-first and terminal facts must be eventized.
- `docs/canon/llm-streaming.md:40` and `docs/canon/nyxid-responses-direct.md:106` document the same actor-owned run recording boundary: Host observes terminal projection, `LlmSessionGAgent` owns `responseId + runId + sequence`, and future execution split must be run/session-scoped or durable. This is consistent with `CLAUDE.md` "Host 只做协议适配与依赖组合" and "查询始终走 readmodel".
- Old/New pattern policy: `HOST_REFACTOR_COMMENT_POLICY` was empty, so it normalizes to `none`; grep of the net C# / proto diff found no new Old/New or refactor-history source comments. No proto files changed, `HOST_PROTO_POLICY` was empty, and no host production SSOT facts were moved into `.refactor-loop`.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve⟦AI:AUTO-LOOP⟧ |
🤖 PR 2289 round 1 reject fix 已完成TL;DR
修复摘要本轮只有
验证结果
📎 完整 codex 原始输出(存档备查)# Fix report for PR 2289 round 1
## Non-blocking reviewer context
- reviewer:architect approved. Evidence cited actor-owned typed run facts, `responseId + runId + sequence` gating, protobuf typed contracts, cancellation terminal fact flushing, docs updates, and no host/config scope violation.
- reviewer:tests approved. Evidence cited cancellation/no-terminal coverage, deterministic `TaskCompletionSource` usage, terminal duplicate/late/conflicting recorder tests, replay tests, duplicate run admission, forwarded tool-call recorder coverage, and no new polling/skip/no-op tests.
## Applied
- (A) src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1245: removed unused private helpers `CompletionAlreadyRecorded` and `RunTerminalAlreadyRecorded`; `git grep` over `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs` and `test/Aevatar.GAgentService.Tests` now returns no matches (addresses reviewer:quality's evidence #1 and #2).
- (A) src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583: typed recorder event handlers now persist the accepted typed fact directly instead of routing through live sink helpers that recompute sequence; this keeps `responseId + runId + sequence` facts authoritative and makes the PR's new acceptance tests pass.
- (A) src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:850: terminal transitions now allow a monotonic sequence gap for completed/failed/cancelled facts while non-terminal chunk/tool records still require the exact next sequence; this aligns transition replay with `CanPersistRunFact(... terminal: true)` and the documented terminal-finalizer behavior.
## Rejected as false positive
- None.
## Blocked (cannot fix this round)
- None.
## Build status
- build: pass (`dotnet build aevatar.slnx --nologo`)
- tests: pass (`dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --nologo`, 953 passed, 0 failed, 0 skipped)
- guard: pass (`bash tools/ci/test_stability_guards.sh`)
- review-thread-completion: not applicable; `/Users/chronoai/Code/aevatar/.refactor-loop/state/review-thread-completion/pr2289.json` is absent.
## Recommendation for next round
- expect unanimous; the only reject evidence was addressed, and validation passes.
⟦AI:AUTO-LOOP⟧
FIX_DONE:2289:round-1:applied-3:rejected-0:blocked-0⟦AI:AUTO-LOOP⟧ |
🤖 Quality review: comment on small readability cleanupTL;DR
质量发现
📎 完整 codex 原始输出(存档备查)pr: 2289
|
🤖 tests reviewer:PR 2289 测试角度 approveTL;DR
测试覆盖结论
测试卫生没有发现新增 我本地执行了聚焦测试: 结果:47 passed, 0 skipped。 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: tests
head_sha: 941790febd654684357c03802b35d4ee6ff81c24
verdict: approve
---
## Verdict
Test coverage and quality are adequate for the LLM streaming actor recording and acceptance changes.
## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:425` covers the cancellation-token regression in `LlmRunCore.RunAsync`: the sink records `LlmRunCancelled` with a non-cancelled token and no completed/failed terminal fact.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:445` covers a live provider stream that never emits a terminal chunk, using `TaskCompletionSource` plus cancellation instead of sleep-based pacing.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:15` exercises duplicate chunk suppression, terminal completion authority, late chunks, repeated terminal dispatch, and later failed facts against the actor state.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:38` and `:58` cover cancelled and failed terminal authority, including ignored flushes or later terminal facts.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:80` covers replay accepting a terminal event despite a missing intermediate recorder command, matching the new terminal sequence-gap behavior.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:803` covers duplicate running run admission as a no-op, and `:1223` covers forwarded tool-call recorder persistence plus duplicate suppression.
- Hygiene checks found no added `[Skip]`, manual category trait, `Task.Delay`, `Thread.Sleep`, `WaitUntilAsync`, `Assert.True(true)`, or weakened assertions in the touched test diff. The focused command passed: `dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --filter "FullyQualifiedName~LlmRunCoreTests|FullyQualifiedName~LlmRunEndToEndAcceptanceTests|FullyQualifiedName~LlmSessionGAgentTests" --nologo` returned 47 passed, 0 skipped.
## What would change your verdict (only if comment or reject)
N/A.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve⟦AI:AUTO-LOOP⟧ |
🤖 architect review approve:未发现架构阻塞项TL;DR
架构复核结论本次只按 architect 角色看架构边界,不评价测试质量或性能。
已检查:无 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: architect
head_sha: 941790febd654684357c03802b35d4ee6ff81c24
verdict: approve
---
## Verdict
approve - no architecture compliance concerns found in the reviewed diff.
## Evidence
- No blocking architecture findings. `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37` only changes cancellation-terminal recording to use `CancellationToken.None`, which preserves the actor-owned cancelled fact instead of letting the caller's cancelled token suppress it.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:582` adds typed observed-fact handlers that gate through `CanPersistRunFact` and then commit via `PersistDomainEventAsync`, keeping the session actor as the authoritative owner rather than adding a middle-layer registry or read/write actor split.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:632` records forwarded tool-call facts inside the existing session actor, with duplicate fact validation instead of a parallel store/shim.
- Policy checks: no `.proto` diff, no new `*WriteActor` / `*ReadActor` / `*Store` entity split, no new external repo reference, no `.refactor-loop` host SSOT movement, no new Old/New refactor-history source comments, and no host-specific grep checks were configured.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve⟦AI:AUTO-LOOP⟧ |
🤖 PR 2289 quality review: commentTL;DR
质量发现
生产代码里新增的 typed recorder event handlers 可读性基本稳定: 建议改动很小:删除 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: quality
head_sha: 941790febd654684357c03802b35d4ee6ff81c24
verdict: comment
---
## Verdict
Comment: the production change is readable and focused, but the new shared test harness exposes a helper that is not used anywhere in the PR tree.
## Evidence
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18` defines `BuildRunRequest()`, but `git grep` on the PR head only finds this definition and no callers. This adds dead test-support surface; the acceptance tests use `BuildRecord`, `Chunk`, `Completed`, `Failed`, `Cancelled`, `LocalToolCall`, `StateEvent`, `RecordingLlmRunSink`, and `NeverTerminalLlmProviderFactory`, but not this request builder.
## What would change your verdict (only if comment or reject)
Delete `LlmRunAcceptanceHarness.BuildRunRequest()` or add a concrete acceptance test that uses it.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment⟦AI:AUTO-LOOP⟧ |
🤖 架构审查:reject,文档边界与代码不一致TL;DR
审查结论我给
对应项目规则是: 建议修法:把 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: architect
head_sha: 941790febd654684357c03802b35d4ee6ff81c24
verdict: reject
---
## Verdict
Reject: the new canon documentation describes the LLM provider stream as consumed in the session actor command turn, but the PR head code schedules an off-session execution actor/service path and dispatches recorder commands back to the session actor.
## Evidence
- `docs/canon/llm-streaming.md:43` says live provider `ChatStreamAsync` / `IAsyncEnumerable<LLMStreamChunk>` is consumed by `ILlmRunCore` "在同一 actor command turn 内连续消费", and `docs/canon/nyxid-responses-direct.md:122` / `:127` say the provider stream is owned by the session actor command turn. In the actual PR head, `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:411`-`:417` records `LlmRunExecutionReadyEvent` and schedules transient execution, `src/platform/Aevatar.GAgentService.Core/GAgents/LlmRunExecutionGAgent.cs:22`-`:35` delegates `ExecuteLlmRunRequested` to `ILlmRunExecutionService`, and `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunExecutor.cs:82`-`:99` calls `runCore.RunAsync(...)` with a `DispatchingLlmRunSink` that dispatches `Record*` commands back through `IActorDispatchPort` at `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunExecutor.cs:145`-`:163`. This violates `CLAUDE.md` "变更必须可验证:架构调整需同步文档,且 `build/test` 通过。" and "架构调整 PR 必须同步相关 `docs/`,并在验证结果中列出 build/test/guard。"
## What would change your verdict
Update `docs/canon/llm-streaming.md` and `docs/canon/nyxid-responses-direct.md` to match the actual PR architecture: `LlmSessionGAgent` owns run state, sequence acceptance, and terminal facts; `LlmRunExecutionGAgent` / `ILlmRunExecutor` consumes the live provider stream off the session actor command turn; the sink dispatches typed `Record*` recorder commands back to the session actor via `IActorDispatchPort`. Keep the self-continuation rationale, but word it against that actual execution boundary.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:reject⟦AI:AUTO-LOOP⟧ |
🤖 tests reviewer 结论:approveTL;DR
测试覆盖证据
📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: tests
head_sha: 941790febd654684357c03802b35d4ee6ff81c24
verdict: approve
---
## Verdict
Approve: the PR adds behavior-focused coverage for the new LLM run cancellation flush, actor recorder handlers, duplicate admission, terminal precedence, and replay gap behavior without weakening test quality.
## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:424` covers the changed cancellation path in `LlmRunCore.RunAsync` and asserts the cancelled fact is flushed through the sink with a non-cancelled sink token, plus no completed/failed facts.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:444` covers mid-stream cancellation using `TaskCompletionSource` and cooperative cancellation instead of sleep/polling; it asserts the partial chunk and cancelled terminal fact.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:14` covers duplicate chunks, terminal completion authority, late chunks, and later failed facts through real `HandleEventAsync` actor handling.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:37` and `:57` cover cancelled/failed terminal authority and later flush or terminal facts being ignored.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:79` covers replay applying a terminal event despite missing intermediate recorder sequence, matching the new terminal gap acceptance branch.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:802` covers duplicate running run admission being ignored without re-dispatching.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1222` covers `HandleLlmSessionForwardedToolCallEmittedAsync` persisting a forwarded call and ignoring the duplicate.
- Diff scan found no added `[Skip]`, manual traits, `Task.Delay`, `Thread.Sleep`, `WaitUntilAsync`, or assertion loosening patterns; `bash tools/ci/test_stability_guards.sh` passed.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve⟦AI:AUTO-LOOP⟧ |
🤖 quality review:PR 2289 建议小修后更干净TL;DR
质量发现
📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: quality
head_sha: 941790febd654684357c03802b35d4ee6ff81c24
verdict: comment
---
## Verdict
Comment: the PR is focused and readable overall, but a small production duplication and unused test-support surface should be cleaned up for simplicity.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583`, `:593`, `:603`, `:613`, `:623` each repeat the same recorder-handler shape: null-check, `CanPersistRunFact(...)`, then `PersistDomainEventAsync(payload.Clone(), CancellationToken.None)` or `Task.CompletedTask`. Five near-identical copies cross the checklist's extraction threshold and make future recorder fact handling easier to drift.
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18` adds `BuildRunRequest()`, but the PR-head search shows no caller. Keeping unused factory helpers in shared test support makes later tests look like they should use a canonical request shape that is not actually exercised here.
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:119` records `ForwardedToolCalls`, but the only new caller of `RecordingLlmRunSink` checks stream chunks and cancellation; no test reads this property. This is minor dead test-support surface.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:119` accepts optional `responseId`, but all calls pass only `eventStore`. The optional branch at lines `122-124` is currently unreachable and can be removed until a test needs it.
## What would change your verdict
Extract a tiny private helper for the five direct recorder-event handlers, and delete the unused `BuildRunRequest()`, unused `ForwardedToolCalls` collection/recording, and unused optional `responseId` parameter/branch in the test helper.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment⟦AI:AUTO-LOOP⟧ |
📊 状态卡片 — fix 派出
🤖 controller status banner ⟦AI:AUTO-LOOP⟧ |
🤖 PR 2289 round 3 fix 已处理 architect rejectTL;DR
修复内容本轮唯一阻塞来自 architect:文档写成
验证结果: 📎 完整 codex 原始输出(存档备查)# Fix report for PR 2289 round 3
## Applied
- (A) docs/canon/llm-streaming.md:40: corrected the Responses `LlmSession` streaming boundary so the doc matches the PR-head implementation: `LlmSessionGAgent` owns run state, sequence acceptance, and terminal facts; transient `LlmRunExecutionGAgent` / `ILlmRunExecutor` consumes the live provider stream off the session actor turn; `DispatchingLlmRunSink` sends typed recorder commands back through `IActorDispatchPort`. This addresses reviewer:architect evidence #1 and its demand: "Update `docs/canon/llm-streaming.md` and `docs/canon/nyxid-responses-direct.md` to match the actual PR architecture...". Cited clauses: `CLAUDE.md` "变更必须可验证:架构调整需同步文档,且 `build/test` 通过。" and "架构调整 PR 必须同步相关 `docs/`,并在验证结果中列出 build/test/guard。"
- (A) docs/canon/nyxid-responses-direct.md:122: corrected the direct Responses / Messages / Chat Completions run-recording section with the same actual execution boundary, including the actor-owned finalizer, typed `Record*` commands, `IActorDispatchPort`, and the self-continuation rationale scoped to why live HTTP/provider stream enumerators cannot be resumed by session actor self-message small steps. This addresses reviewer:architect evidence #1.
- (A) refactor self-doc: not applicable (HOST_REFACTOR_COMMENT_POLICY=none). No `Refactor (...)`, `Old pattern`, or `New principle` source comments were added.
## Rejected as false positive
- None.
## Advisory comments read
- reviewer:tests approved and listed behavior coverage for cancellation flush, duplicate admission, terminal precedence, replay gap behavior, and test-stability guard.
- reviewer:quality commented on possible cleanup in `LlmSessionGAgent` recorder handlers and unused test-support helpers. Because this was `comment`, not `reject`, it was recorded as advisory context and not treated as a mandatory blocking demand for this fix round.
## Blocked (cannot fix this round)
- None.
## Build status
- build: pass (`bash -lc "dotnet build aevatar.slnx --nologo"`; 148 existing warnings, 0 errors)
- tests: pass (`bash -lc "dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --filter 'LlmRunCore|LlmRunExecutor|LlmSessionGAgent|LlmRunEndToEndAcceptance' --nologo"`; 81 passed, 0 failed, 0 skipped)
- docs lint: pass (`bash tools/docs/lint.sh`; 60 files checked, 0 errors)
- test stability guard: pass (`bash tools/ci/test_stability_guards.sh`)
- diff check: pass (`git diff --check`)
- review-thread completion: not seeded (`.refactor-loop/state/review-thread-completion/pr2289.json` was absent)
- full solution tests: not run; prompt Step 3 instructed to pick changed test projects and not run the full solution test suite because it is too slow.
## Recommendation for next round
- expect unanimous: the only `reject` demand was a documentation/implementation architecture mismatch, and both cited canon sections now describe the actual off-turn execution actor/service plus actor-owned recorder fact path.
⟦AI:AUTO-LOOP⟧
FIX_DONE:2289:round-3:applied-2:rejected-0:blocked-0⟦AI:AUTO-LOOP⟧ |
🤖 测试视角 review:approveTL;DR
测试覆盖结论
没有发现新增 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: tests
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: approve
---
## Verdict
Test coverage and test quality are adequate for the changed LLM streaming run recording and cancellation behavior.
## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:425` covers the `LlmRunCore.RunAsync` cancellation change by asserting the cancelled fact reaches the sink with a non-cancelled sink token, plus no completed/failed record.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:445` covers mid-stream cancellation with a `TaskCompletionSource` sync point and `WaitAsync(ct)`, not fixed sleep/poll pacing; it asserts the partial chunk and cancelled run fact are recorded.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:15`, `:38`, and `:58` exercise the new actor recorder event handlers through `HandleEventAsync` and assert terminal completion/cancel/failure remain authoritative across duplicate and late records.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:80` covers replay of a terminal event with a missing intermediate sequence, matching the new `allowTerminalGap` branch.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:803` covers duplicate active run admission, and `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1223` covers forwarded tool-call emission idempotency.
- Diff scan of the touched tests found no added `[Skip]`, manual traits, `Assert.True(true)`, `Task.Delay(...)`, or `WaitUntilAsync(...)`; the polling allowlist was not changed.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve⟦AI:AUTO-LOOP⟧ |
🤖 Quality review: commentTL;DR
质量审查结果我只发现一个可读性问题: 建议保留这 5 个 public event handler 的类型入口,只抽出一个小的共享 helper 来承载公共的 recorder-fact persistence shape。这个问题不构成拒绝合并的 blocker,所以 verdict 是 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: quality
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: comment
---
## Verdict
Comment: the change is focused and readable overall, but the new typed recorder handlers add avoidable near-identical duplication.
## Evidence
- src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583 - `HandleLlmStreamChunkObservedAsync`, `HandleLlmToolCallObservedAsync`, `HandleLlmRunCompletedAsync`, `HandleLlmRunFailedAsync`, and `HandleLlmRunCancelledAsync` repeat the same null-check, `CanPersistRunFact(...)`, clone-and-persist, no-op fallback shape across five handlers. This crosses the review checklist's `>=3 near-identical inline copies` threshold and makes future run-fact acceptance behavior easy to update in one handler but miss in another.
## What would change your verdict (only if comment or reject)
Extract one small shared helper for the common recorder-fact persistence path, with the five public event handlers keeping only the type-specific identity/terminal arguments.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment⟦AI:AUTO-LOOP⟧ |
🤖 架构复审结论:approveTL;DR
架构证据本轮重点复核了 LLM 流式运行录制路径: 关键证据如下:
本轮也检查了 patch additions:没有新增 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: architect
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: approve
---
## Verdict
approve - no architecture compliance blocker found for the LLM stream recorder/finalizer changes.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:401` records `LlmRunStartedEvent`, then `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:411` records `LlmRunExecutionReadyEvent` before dispatching transient execution at `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:417`. This keeps the session actor as the run authority while avoiding live provider stream consumption inside the actor turn, matching `CLAUDE.md` "Actor 即业务实体:一个 actor = 一个业务实体(数据与方法同住);禁止按技术功能(读/写/投影)拆分同一业务实体为多个 actor。" and "业务推进内聚:工作流推进(成功/失败/分支/重试)在 Actor 事件处理流程内完成,保证顺序性与可重放性。"
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583`, `:593`, `:603`, `:613`, and `:623` add typed event handlers for `LlmStreamChunkObserved`, `LlmToolCallObserved`, `LlmRunCompleted`, `LlmRunFailed`, and `LlmRunCancelled`, and persist accepted facts through the actor. This matches `CLAUDE.md` "committed domain event 必须可观察:write-side 完成 committed event 后必须送入 projection 主链;禁止只落 event store 而不进入可观察流。"
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1017` and `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1065` gate run facts by terminal status, `responseId`, `runId`, monotonic sequence, and record id rather than a process-local registry. This matches `CLAUDE.md` "事实源唯一:跨请求/跨节点一致性事实必须有唯一权威来源(Actor 持久态或分布式状态),不依赖进程内偶然状态。" and AGENTS.md "禁止在中间层维护 `entity/actor/workflow-run/session` 等 ID 到上下文或事实状态的进程内映射(`Dictionary<>`、`ConcurrentDictionary<>`、`HashSet<>`、`Queue<>`)。"
- `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37` records cancellation with `CancellationToken.None`, so caller cancellation cannot suppress the terminal `LlmRunCancelled` fact. This is consistent with AGENTS.md "AI 对话主链必须流式化:仓库内新增或修改 AI 对话执行路径时,禁止使用 `ChatAsync` 作为正式主链;必须使用 `ChatStreamAsync` 作为唯一权威执行入口,使文本、reasoning、tool call、tool result、完成态都沿同一条流式链路对外发布。"
- `docs/canon/llm-streaming.md:40` through `docs/canon/llm-streaming.md:49` now documents the same boundary implemented by the code: transient execution consumes the live provider stream outside the session actor turn, typed recorder facts return through `IActorDispatchPort`, and `Task.Run` / self-continuation stream ownership are rejected. This satisfies `CLAUDE.md` "架构调整 PR 必须同步相关 `docs/`,并在验证结果中列出 build/test/guard。"
- Scope and policy checks from the actual diff: no `.proto` files changed; no new `Old pattern` / `New principle` source comments were added while `HOST_REFACTOR_COMMENT_POLICY` is empty/default `none`; no `.refactor-loop`/host production SSOT references, external repo references, new `*WriteActor` / `*ReadActor` / `*StoreActor` split, process-local middle-layer run registry, `ChatAsync`, `Task.Run`, `TypeUrl.Contains`, or `5000`/`5050` additions appeared in the patch.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve⟦AI:AUTO-LOOP⟧ |
🤖 架构审查通过TL;DR
架构审查要点本轮只看架构合规,不评测试覆盖质量或性能。 关键判断:
涉及的核心位置:
📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: architect
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: approve
---
## Verdict
approve - no architecture compliance regression found in the changed LLM streaming recorder/finalizer path.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:401` persists `LlmRunStartedEvent`, `LlmRunExecutionReadyEvent`, and later typed run facts through the session actor instead of query-time reconstruction or host-side response assembly; this is aligned with CLAUDE.md: "单一权威拥有者:每个稳定业务事实有唯一 actor 拥有;`committed event store + actor state` 是唯一真相,readmodel 只是查询副本。"
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:582` adds handlers that accept `LlmStreamChunkObserved`, `LlmToolCallObserved`, `LlmRunCompleted`, `LlmRunFailed`, and `LlmRunCancelled` as typed actor facts, guarded by `CanPersistRunFact`; this keeps the model in the `Command -> Event` lane and matches CLAUDE.md: "核心语义强类型:影响业务语义、控制流、稳定读取且仓库内可控的数据,必须建模为 `proto field / typed option / typed sub-message`,禁止塞入通用 bag。"
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:731` and `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:833` keep duplicate/late recorder facts idempotent using sequence/record identity inside actor state, with no new middle-layer `actorId -> context` dictionary or read/write actor split; this preserves AGENTS.md: "投影编排 Actor 化:Projection 的会话、订阅、关联关系等运行态必须由 Actor 或分布式状态承载;禁止在中间层通过进程内注册表/字典持有事实状态。"
- `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37` records cancellation with `CancellationToken.None`, so the final cancellation fact is not suppressed by the caller cancellation token; this supports CLAUDE.md: "ACK 诚实:同步返回只承诺已达到阶段(默认 `accepted + stable command id`);`committed`/`read-model observed` 等强保证须通过独立契约或异步观察获取。"
- `docs/canon/llm-streaming.md:38` and `docs/canon/nyxid-responses-direct.md:120` document the transient execution actor/service boundary and explicitly state the session actor remains the durable sequence/watermark/finalizer authority; this satisfies CLAUDE.md: "变更必须可验证:架构调整需同步文档,且 `build/test` 通过。"
- Comment-policy check: `HOST_REFACTOR_COMMENT_POLICY` is empty, so it normalizes to `none`; the actual added source/doc diff contains no new Old/New or iteration refactor-history comments.
- Scope/boundary check: the diff stays inside issue #2276's LLM run acceptance and stream-not-resumable rationale scope; no `.proto` files, no external repository references requiring changes, no `.refactor-loop/host.env` or production SSOT movement, and no new `*WriteActor` / `*ReadActor` / `*Store` split were found.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve⟦AI:AUTO-LOOP⟧ |
🤖 tests review approve:LLM 流式 run 录制测试覆盖足够TL;DR
测试覆盖结论我检查了 关键证据:
稳定性检查结果:没有新增 本地验证通过: 📎 完整 codex 原始输出(存档备查)pr: 2289
|
🤖 Quality review: commentTL;DR
质量发现
📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: quality
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: comment
---
## Verdict
Comment: the PR is focused and readable overall, but the new recorder-event path adds avoidable duplication and a few raw numeric status assertions that obscure intent.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583`: `HandleLlmStreamChunkObservedAsync`, `HandleLlmToolCallObservedAsync`, `HandleLlmRunCompletedAsync`, `HandleLlmRunFailedAsync`, and `HandleLlmRunCancelledAsync` repeat the same `CanPersistRunFact(...) ? PersistDomainEventAsync(Clone, None) : CompletedTask` shape five times. This crosses the checklist's ">=3 near-identical inline copies" threshold and makes future changes to recorder acceptance/finalizer handling easy to miss in one handler.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:30`: the new acceptance tests assert `ActiveRun.Status` with raw literals (`2`, `3`, `4`) at lines 30, 51, and 71. The production code has named meanings for those values, so the tests should express `Completed/Failed/Cancelled` intent instead of requiring readers to remember the numeric encoding.
## What would change your verdict
Extract the repeated recorder accept/persist body into a small private helper such as `PersistAcceptedRunFactAsync(IMessage fact, string responseId, string runId, long sequence, bool terminal)`, and replace the raw status literals in the new acceptance tests with named local constants or an exposed test helper that names the run status values.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment⟦AI:AUTO-LOOP⟧ |
🤖 tests review: approveTL;DR
测试覆盖结论
重复 run admission 和 forwarded tool call 路径也有行为断言。 稳定性方面,PR diff 没有新增 📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: tests
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: approve
---
## Verdict
Approve: the PR adds behavior-level coverage for the new LLM streaming recorder paths, cancellation flush behavior, duplicate suppression, and terminal replay semantics without adding skips, sleep-based pacing, or weakened assertions.
## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:425` verifies the cancellation path records `LlmRunCancelled` through the sink with a non-cancelled sink token, covering the `CancellationToken.None` change in `LlmRunCore.RunAsync`.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:445` covers a never-terminal provider stream by asserting the partial chunk is preserved, cancellation is recorded, and no completion/failure is emitted; it uses `TaskCompletionSource` and cancellation rather than `Task.Delay` polling.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:15` asserts completion remains authoritative across duplicate chunks, late chunks, duplicate completion, and later failure facts.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:38` and `:58` cover cancelled and failed terminal facts staying authoritative and ignoring later chunks/terminal facts.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:80` covers replay applying a terminal event despite a missing intermediate sequence while rejecting the late lower-sequence chunk.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:803` covers duplicate running run admission as an idempotent no-op.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1173` and `:1205` cover forwarded tool call recording through the streaming run and direct forwarded-tool-call event duplicate suppression.
- Stability checks found no added `[Skip]`, manual traits, `Assert.True(true)`, `Task.Delay`, `Thread.Sleep`, `WaitUntilAsync`, or polling allowlist changes in the PR diff.
- Targeted verification passed: `dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --no-restore --filter "FullyQualifiedName~LlmRunCoreTests|FullyQualifiedName~LlmRunEndToEndAcceptanceTests|FullyQualifiedName~LlmSessionGAgentTests" --nologo` reported 47 passed, 0 failed, 0 skipped.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve⟦AI:AUTO-LOOP⟧ |
🤖 质量审查结论:commentTL;DR
质量证据
📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: quality
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: comment
---
## Verdict
Comment: the PR is focused and readable overall, with no new refactor self-doc defect under `HOST_REFACTOR_COMMENT_POLICY=none`, but it adds a little avoidable duplication and unused test support surface.
## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583` adds five near-identical recorder handlers (`HandleLlmStreamChunkObservedAsync`, `HandleLlmToolCallObservedAsync`, `HandleLlmRunCompletedAsync`, `HandleLlmRunFailedAsync`, `HandleLlmRunCancelledAsync`) that all compute `accepted = CanPersistRunFact(...)` and conditionally persist a cloned event. That is five inline copies of the same admission/persist shape; a small helper would make the terminal/non-terminal distinction easier to review.
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18` introduces `BuildRunRequest()`, but no touched test calls it. Keeping unused test factory surface makes the acceptance harness look broader than it is.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:119` adds an optional `responseId` parameter to `CreateActor`, but every caller uses the default. The parameter can be deleted until a test actually needs alternate actor identity.
## What would change your verdict (only if comment or reject)
Extract the repeated recorder-event admission/persist wrapper in `LlmSessionGAgent`, and delete or wire the unused test helper/optional parameter.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment⟦AI:AUTO-LOOP⟧ |
🤖 架构审查 r13:approveTL;DR
架构合规结论
关键证据:
📎 完整 codex 原始输出(存档备查)---
pr: 2289
role: architect
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: approve
---
## Verdict
approve — no architecture compliance regression found in the scoped LLM streaming actor-recording diff.
## Evidence
- `docs/canon/llm-streaming.md:40` and `docs/canon/nyxid-responses-direct.md:122` document the run boundary as transient execution actor/service consuming the live provider stream while `LlmSessionGAgent` remains the actor-owned fact source; this aligns with CLAUDE.md clauses "AI 对话主链必须流式化:实时会话入口必须使用 `ChatStreamAsync`" and "单一权威拥有者:每个稳定业务事实有唯一 actor 拥有".
- `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37` flushes cancellation through `RecordRunCancelledAsync` with `CancellationToken.None`, preserving a typed terminal fact instead of letting caller cancellation suppress actor recording; this supports the CLAUDE.md clause "已提交领域事件必须可观察".
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:401`, `:411`, and `:421` keep run start, execution-ready, and dispatch scheduling inside the existing `LlmSessionGAgent` state machine; no new `*WriteActor` / `*ReadActor` / `*Store` split is introduced.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:586` through `:626` accepts typed run facts only through actor handlers, and `:1065` gates sequence/terminal acceptance against actor state; this follows the CLAUDE.md clauses "Actor/模块运行态只能在事件处理主线程修改" and "核心语义强类型".
- No `.proto` files changed in the PR diff, no new source `Old pattern` / `New principle` refactor-history comments were added under the default `HOST_REFACTOR_COMMENT_POLICY=none`, no external repository references were added, and the diff stays within the issue-2276 authorized source/docs/test scope.
⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve⟦AI:AUTO-LOOP⟧ |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 978dc11b91
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (State.ActiveRun == null || | ||
| !string.Equals(State.ActiveRun.RunId, runId, StringComparison.Ordinal)) | ||
| { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Reject recorder facts for non-active runs
When a typed recorder fact arrives with a run_id different from the current ActiveRun, this branch accepts it; the applier then calls EnsureRun, which replaces ActiveRun and can append chunks or terminalize the response for the wrong run. In a stale/misrouted execution dispatch scenario, this bypasses the stricter Record* path that rejects mismatched run ids and lets a foreign run overwrite an in-flight session instead of being ignored.
Useful? React with 👍 / 👎.
| await PersistDomainEventAsync(new LlmSessionForwardedToolCallEmittedEvent | ||
| { | ||
| ResponseId = existing.ResponseId, | ||
| Call = call, | ||
| }); |
There was a problem hiding this comment.
Preserve run sequencing on forwarded tool events
If an execution recorder dispatches LlmSessionForwardedToolCallEmittedEvent with run_id, sequence, and record_id, this handler drops those fields before persisting. The applier therefore treats it as a session-level tool call, never advances the run sequence or records the tool on ActiveRun, so the next non-terminal fact with the following sequence is rejected as out of order.
Useful? React with 👍 / 👎.
Changed files
LlmRunCore的取消路径,确保取消终态录制不会被调用方已取消的 token 抑制。LlmSessionGAgent,让 live provider stream 在 actor turn 外持续消费,并通过 typed recorder facts 回写 actor 权威状态。llm-streaming与nyxid-responses-direct文档,说明 actor-owned recorder/finalizer 语义以及 self-continuation 不能恢复 live HTTP stream 的原因。Test results
dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --filter "LlmRunCore|LlmRunExecutor|LlmSessionGAgent|LlmRunEndToEndAcceptance" --nologo通过,59 passed。dotnet build aevatar.slnx --nologo通过,0 errors。bash tools/ci/test_stability_guards.sh通过。dotnet test aevatar.slnx --nologo通过。guards skipped: CI_GUARDS unset。Deviations
refactor self-doc: not applicable (HOST_REFACTOR_COMMENT_POLICY=none)⟦AI:AUTO-LOOP⟧