Skip to content

完善 LLM 流式运行的 actor 录制与验收覆盖#2289

Merged
eanzhao merged 6 commits into
feature/integratefrom
refactor/iter2276-issue-2276
Jun 20, 2026
Merged

完善 LLM 流式运行的 actor 录制与验收覆盖#2289
eanzhao merged 6 commits into
feature/integratefrom
refactor/iter2276-issue-2276

Conversation

@eanzhao

@eanzhao eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Changed files

  • 调整 LlmRunCore 的取消路径,确保取消终态录制不会被调用方已取消的 token 抑制。
  • 调整 LlmSessionGAgent,让 live provider stream 在 actor turn 外持续消费,并通过 typed recorder facts 回写 actor 权威状态。
  • 增加 LLM run executor、端到端验收、core cancel/no-terminal 与 session actor 行为测试,覆盖重复 chunk、late recorder command、cancel flush、terminal skip/retry 与不等待 provider stream 等路径。
  • 更新 llm-streamingnyxid-responses-direct 文档,说明 actor-owned recorder/finalizer 语义以及 self-continuation 不能恢复 live HTTP stream 的原因。

Test results

  • dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --filter "LlmRunCore|LlmRunExecutor|LlmSessionGAgent|LlmRunEndToEndAcceptance" --nologo 通过,59 passed。
  • dotnet build aevatar.slnx --nologo 通过,0 errors。
  • bash tools/ci/test_stability_guards.sh 通过。
  • dotnet test aevatar.slnx --nologo 通过。
  • Host 架构守卫未配置,记录为 guards skipped: CI_GUARDS unset

Deviations

⟦AI:AUTO-LOOP⟧

@eanzhao eanzhao added crnd:lifecycle:managed Item is managed by codex-refactor-loop. crnd:phase:reviewing Review workers are active. crnd:human:auto Controller may continue without maintainer intervention. labels Jun 19, 2026
@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 架构复核通过:typed actor-owned recorder 路径未引入新架构债

TL;DR

  • 这是什么: PR 2289 的 architecture compliance 复核,范围是 LLM streaming run recorder / acceptance coverage。
  • 现在到哪一步 / 结论是什么: architect verdict 是 approve,未发现新增 CLAUDE.md / AGENTS.md 架构违规。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可按共识真值表继续汇总;maintainer 无需因 architect 角度阻塞合并。

架构证据

LlmSessionGAgent 是 session actor,也就是 response session 的权威状态拥有者;本 PR 让 live provider stream 在 actor turn 外持续消费,但每个 chunk / tool / terminal fact 都通过 typed recorder event 回投 actor。IActorDispatchPort 是 actor 之间发命令的标准通道,这里用于把 EventEnvelope 送回同一个 session actor,而不是在后台线程直接写 actor State

检查项 结论 依据
actor 权威状态 通过 LlmRunStartedEventLlmStreamChunkObservedLlmRunCompleted 等事实仍由 LlmSessionGAgent 持久化
Task.Run 边界 通过 后台任务只消费 live ChatStreamAsync 并 dispatch typed facts,不直接读写 actor 运行态
协议 / schema 通过 diff 中无 .proto 变更
host SSOT 通过 未把 host 配置、分支、ledger 或生产事实移动到 .refactor-loop/ / host.env
删除优先 / 兼容壳 通过 未新增空转发接口、dead wrapper 或 parallel pathway

docs/canon/llm-streaming.mddocs/canon/nyxid-responses-direct.md 已补充说明为什么 live HTTP stream 不能用 actor self-continuation 小步恢复:枚举器包含不可持久化的连接和 provider SDK 状态。这个文档更新和实现边界一致。

📎 完整 codex 原始输出(存档备查)

pr: 2289
role: architect
head_sha: 4a9d4e0
verdict: approve

Verdict

approve: no architecture compliance concerns; the PR keeps LLM stream recording as typed actor-owned facts and does not add a parallel state source, protocol/schema change, host-private SSOT, or compatibility shell.

Evidence

  • No blocking issue flagged. src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:276 records LlmRunStartedEvent in the session actor, while src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:336 through src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:384 accept typed recorder facts back into the same actor. This maps to CLAUDE.md: "单一权威拥有者:每个稳定业务事实有唯一 actor 拥有;committed event store + actor state 是唯一真相,readmodel 只是查询副本。" and "面向对象内聚:同一业务实体的状态、命令处理、事件发布在同一个 actor 内完成;禁止将数据和方法拆分到不同 actor 再拼装。"
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:326 through src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:333 use Task.Run only to keep consuming the live provider stream; the sink at src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:961 through src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1028 wraps each observed fact in EventEnvelope and dispatches through IActorDispatchPort. It does not mutate actor State off-turn, so it remains aligned with CLAUDE.md: "Task.Run/Timer/线程池回调不直接读写运行态或推进业务;只发布内部触发事件。" and "传输载体可替换:上层依赖投递契约(IActorDispatchPort),不依赖具体载体。"
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:809 through src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:830 gate recorder commands by responseId + runId + sequence, and src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:867 through src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:903 make terminal persistence idempotent. This supports CLAUDE.md: "显式对账:内部触发事件携带最小充分相关键(如 run_id + step_id),Actor 内做活跃态校验,拒绝陈旧事件。"
  • src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:35 through src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:42 flush LlmRunCancelled with CancellationToken.None, preserving cancellation as a typed fact rather than a suppressed sink write. This maps to CLAUDE.md: "AI 对话主链必须流式化:实时会话入口必须使用 ChatStreamAsync" and "业务推进内聚:工作流推进(成功/失败/分支/重试)在 Actor 事件处理流程内完成,保证顺序性与可重放性。"
  • docs/canon/llm-streaming.md:38 through docs/canon/llm-streaming.md:48 and docs/canon/nyxid-responses-direct.md:120 through docs/canon/nyxid-responses-direct.md:134 document the off-turn live stream boundary and why actor self-continuation cannot replay a live HTTP stream. This satisfies CLAUDE.md: "架构调整 PR 必须同步相关 docs/,并在验证结果中列出 build/test/guard。"
  • The inspected diff contains no .proto changes, no .refactor-loop/host.env production fact movement, no external repository path changes, no new *WriteActor / *ReadActor / *Store split, no new Old/New refactor-history source comments, and no added forbidden 5000/5050 literals. With empty host variables, HOST_REFACTOR_COMMENT_POLICY normalizes to none, and no host-specific grep/proto policy applies.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve

⟦AI:AUTO-LOOP⟧

@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.10638% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.00%. Comparing base (90ea572) to head (978dc11).
⚠️ Report is 232 commits behind head on feature/integrate.

Files with missing lines Patch % Lines
...tar.GAgentService.Core/GAgents/LlmSessionGAgent.cs 84.94% 5 Missing and 9 partials ⚠️
@@                  Coverage Diff                  @@
##           feature/integrate    #2289      +/-   ##
=====================================================
- Coverage              84.08%   84.00%   -0.08%     
=====================================================
  Files                   1139     1197      +58     
  Lines                  82320    91202    +8882     
  Branches               10705    11935    +1230     
=====================================================
+ Hits                   69219    76617    +7398     
- Misses                  8414     9296     +882     
- Partials                4687     5289     +602     
Flag Coverage Δ
ci 84.00% <85.10%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
....GAgentService.Application/Responses/LlmRunCore.cs 81.34% <100.00%> (ø)
...tar.GAgentService.Core/GAgents/LlmSessionGAgent.cs 68.04% <84.94%> (-5.40%) ⬇️

... and 180 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 架构复核:PR 2289 暂不建议合并

TL;DR

  • 这是什么: 我从 CLAUDE/AGENTS 架构合规角度复核了 PR 2289 的实际 diff。
  • 结论是什么: reject,原因是生产 actor 内新增 Task.Run 承载 live LLM/tool run 推进,违反 actor 执行模型。
  • 需要下一步: implement codex 需要把这段长生命周期执行改回 actor-owned 或 runtime-neutral 的 continuation/执行者契约后再进下一轮。

架构阻塞点

Task.Run 是 .NET 线程池后台任务入口;这里它不只是发一个最小内部信号,而是在 actor turn 结束后继续消费 provider stream 和工具循环。IActorDispatchPort 是 actor 之间投递消息的标准通道;PR 里用它把后台任务观察到的 chunk/tool/terminal fact 再回投给 session actor,但 run 推进本身已经离开 actor event handler。

位置 为什么重要 触发条款
src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:326 LlmRunStartedEvent 后直接 _ = Task.Run(...),后台任务开始执行 ConsumeRunOffActorTurnAsync 回调只发信号:Task.Run/Timer/线程池回调不直接读写运行态或推进业务;只发布内部触发事件。
src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:961 SelfDispatchingLlmRunSink 用进程内 _lastSequence 给 streamed facts 排序再回投 actor,sequence authority 不再完全来自 actor/durable state。 跨进程、跨 turn 或跨节点的事实必须有权威记录;进程内记忆、cache、临时变量不能冒充事实源。

建议修复方向:actor turn 可以记录 accepted/run-start fact 并发布最小内部信号;如果确实需要独立 executor,应该建模为明确的 run/session-scoped actor 或 durable runtime component,让 sequence/watermark、retry、cancel、terminal 由 actor state 或另一份明确分布式权威状态承载。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: architect
head_sha: 4a9d4e0bbb5715859499c3c2ad13168a53a80c90
verdict: reject
---

## Verdict
reject - PR moves the live LLM/tool run progression into a `Task.Run` callback outside the actor event turn, which regresses CLAUDE's actor execution model.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:326`: after recording `LlmRunStartedEvent`, the actor starts `_ = Task.Run(() => ConsumeRunOffActorTurnAsync(...))`; `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:961` then uses `SelfDispatchingLlmRunSink` with process-local `_lastSequence` to turn streamed chunks/tool/terminal results back into actor messages. This is more than a callback signal: it consumes the provider stream and advances the run/tool loop outside the actor event handler, then relies on a process-local sequence counter for run fact ordering. CLAUDE clauses violated: "回调只发信号:`Task.Run`/`Timer`/线程池回调不直接读写运行态或推进业务;只发布内部触发事件。" and "业务推进内聚:工作流推进(成功/失败/分支/重试)在 Actor 事件处理流程内完成,保证顺序性与可重放性。" It also conflicts with FI-004: "跨进程、跨 turn 或跨节点的事实必须有权威记录;进程内记忆、cache、临时变量不能冒充事实源。"

## What would change your verdict
Move the long-lived LLM run execution back behind an actor-owned/runtime-neutral continuation contract: the actor turn may persist accepted/run-start facts and publish minimal internal signals, but the live provider/tool progression must not be owned by an untracked `Task.Run` callback with process-local sequencing. If a separate executor is required, make it an explicit run/session-scoped actor or durable runtime component whose facts, sequence/watermark, retry, cancellation, and terminal behavior are owned by actor state or another documented distributed authority.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:reject

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Quality review 通过

TL;DR

  • 这是什么: PR 2289 的代码质量视角复审,关注可读性、命名、简单性、死代码和无关改动。
  • 现在到哪一步 / 结论是什么: 结论为 approve,未发现需要阻塞合并的质量问题。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可继续按其它 reviewer 结果推进合并判断。

质量结论

本轮只看 quality 维度,不评估架构条款、测试充分性或性能。LlmSessionGAgent 是 Responses 直连场景里的 session actor,负责持有一次 LLM run 的权威状态;新增的 recorder event handler 把 provider stream 观察到的 chunk/tool/terminal fact 回投给 actor,并用 responseId + runId + sequence 做幂等接受。

我按 PR head 打开了触达文件,而不是只看 diff hunk。主要观察点如下:

  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:277: HandleLlmRunRequestedAsync 仍在可读范围内,off-turn stream consumption 与 actor 内 typed recorder fact 处理拆分清楚。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:337: 新增的 recorder handlers 都是可达事件入口,并共用 CanPersistRunFact,没有把 run identity / sequence 判定复制到多个大块里。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:780:809: terminal sequence 与 run fact acceptance 的业务意图通过方法名表达清楚,没有发现泛化 Manager/Helper 式命名。
  • test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:12: 新测试 harness 复用了 acceptance/executor 测试夹具;未发现新增未使用 public surface。
  • HOST_REFACTOR_COMMENT_POLICY 为空,按规则归一为 none;本 PR diff 没有新增 Refactor/Old pattern/New principle 自文档注释。
📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: quality
head_sha: 4a9d4e0bbb5715859499c3c2ad13168a53a80c90
verdict: approve
---

## Verdict
Approve: the change is readable and focused on typed LLM run recording/acceptance semantics, with no objective naming, dead-code, over-abstraction, complexity, or refactor self-doc policy issues found.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:277` keeps the run request handler under the preferred size threshold and separates off-turn stream consumption from typed recorder event handling through small, named helpers.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:337` adds reachable event handlers for each typed recorder fact; the handlers share one acceptance predicate instead of duplicating the run identity/sequence checks.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:809` expresses the key business rule directly as `CanPersistRunFact`, and terminal sequence handling is isolated in `CanAcceptRunTerminalSequence` at line 780.
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:12` centralizes acceptance-test fixtures and fake providers used by the new acceptance/executor tests; no unused new public test support surface was found in the touched diff.
- `${HOST_REFACTOR_COMMENT_POLICY}` is empty and therefore normalizes to `none`; the PR diff adds no new `Refactor`/`Old pattern`/`New principle` self-doc comments.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Tests review: reject,off-turn recorder path 缺少生产链路覆盖

TL;DR

  • 这是什么: PR 2289 的测试质量审查,只看覆盖与测试稳定性。
  • 结论是什么: reject,新增 off-turn actor 录制路径有真实覆盖缺口。
  • 需要 maintainer 做什么: 补一个 actor 级测试,证明后台流式运行经生产 IActorDispatchPort 回灌后能完成状态录制。

测试覆盖问题

IActorDispatchPort 是 actor 之间投递命令/事件的标准通道;这次新路径在 actor turn 接受请求后,把 LLM 流式消费放到后台,再通过这个通道把 LlmStreamChunkObservedLlmRunCompleted 等事实发回 actor。

必须修的缺口有 2 个:

位置 为什么影响测试结论
src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:327 / :961 生产代码现在通过 Task.Run + 私有 SelfDispatchingLlmRunSink 发回 recorder facts;但 test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:768 只断言 actor 已返回、provider 已开始流式消费,没有在 releaseStream.SetResult() 后观察真正注册的 dispatch port 是否收到 chunk/completed,也没有把这些 envelope 回灌 actor 验证最终完成态。这个生产 sink 坏掉时测试仍可能通过。
src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:387 新增 HandleLlmSessionForwardedToolCallEmittedAsync 处理 self-dispatched forwarded tool call;现有 test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:915 覆盖的是旧 in-actor sink,test/Aevatar.GAgentService.Tests/Application/LlmRunExecutorTests.cs:16 覆盖的是 test-support sink,不是生产 off-turn handler 路径。

我没有发现这组测试新增 [Skip]、manual trait、Task.DelayWaitUntilAsync,也没有 polling allowlist 变更。

建议补一个 actor 级测试:保留 recording IActorDispatchPort 实例,释放 gated provider,用确定性同步点等待后台 run dispatch,随后把 dispatch 出来的 envelope 交给 actor 处理,并断言最终 Record.StatusActiveRun.OutputTextCompletion。同一个测试或相邻测试应覆盖 forwarded-tool case,确保 HandleLlmSessionForwardedToolCallEmittedAsync 通过 off-turn 路径被执行。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: tests
head_sha: 4a9d4e0bbb5715859499c3c2ad13168a53a80c90
verdict: reject
---

## Verdict
Reject: the new off-turn actor recording path has a real coverage gap around the production self-dispatch sink and forwarded-tool recorder handler.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:327` now starts the LLM run on `Task.Run` and routes facts through the private production `SelfDispatchingLlmRunSink` at `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:961`, but `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:768` only proves the actor returns after admission and the provider stream starts. It never observes the registered `IActorDispatchPort` after `releaseStream.SetResult()`, so a regression where the production sink fails to dispatch `LlmStreamChunkObserved` / `LlmRunCompleted` back to the actor would still pass.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:387` adds the public `HandleLlmSessionForwardedToolCallEmittedAsync` recorder handler for self-dispatched forwarded tool calls. Existing forwarded-tool assertions at `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:915` cover the old in-actor sink path, while `test/Aevatar.GAgentService.Tests/Application/LlmRunExecutorTests.cs:16` covers only the test-support `DispatchingLlmRunSink`; neither exercises the new production handler path used when `IActorDispatchPort` is present.
- I found no added `[Skip]`, manual trait, `Task.Delay`, `WaitUntilAsync`, or polling allowlist churn in the touched tests.

## What would change your verdict
Add an actor-level test with a retained recording `IActorDispatchPort` that releases the gated provider, waits deterministically for the background run to dispatch, feeds the dispatched envelopes through the actor, and asserts the final actor state/completion. Include the forwarded-tool case so `HandleLlmSessionForwardedToolCallEmittedAsync` is exercised through the off-turn path, not only through a test-support sink.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:reject

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 tests reviewer:拒绝,off-actor 录制路径缺少真实覆盖

TL;DR

  • 这是什么: PR 2289 的测试质量视角独立 review。
  • 现在到哪一步 / 结论是什么: reject,新增的生产 off-actor LLM recorder 路径没有被真实测试覆盖。
  • 需要 maintainer 做什么: 补上直接走 actor/off-turn 生产路径的断言,再补 forwarded tool call recorder event handler 的行为测试。

测试覆盖结论

IActorDispatchPort 是 actor 之间投递命令/事件的标准通道;本 PR 新增的生产逻辑会在 actor turn 外把 LLM 流式事实重新投递回 actor。当前测试主要覆盖了测试 harness 里的相似 sink,或者旧的 in-actor/direct command 路径,没有证明新增生产路径本身能正确发出 recorder facts。

位置 问题 为什么影响合并
src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:961 新增 SelfDispatchingLlmRunSink,负责序列号分配和通过 IActorDispatchPort dispatch envelope。 test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:768 / :806 注册了 RecordingActorDispatchPort,但匿名创建后没有断言释放 provider 后实际 dispatch 出的 envelope;test/Aevatar.GAgentService.Tests/Application/LlmRunExecutorTests.cs:16 / :39 测的是 test-only LlmRunAcceptanceHarness.DispatchingLlmRunSink,不是生产 private sink。
src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:386 新增 HandleLlmSessionForwardedToolCallEmittedAsync recorder event handler。 现有测试覆盖的是旧 HandleRecordForwardedToolCallAsync 或旧 in-actor run 路径;没有测试直接调用新 event handler,因此 first persist、duplicate idempotency、mismatched duplicate rejection、terminal ignore 这些分支都缺覆盖。

我没有发现本次测试改动新增 Task.DelayThread.SleepWaitUntilAsync[Skip] 或 manual trait,也没有发现 polling allowlist 变更。

需要把现有 gated provider 测试改成保留 RecordingActorDispatchPort 引用,释放 provider 后用确定性信号等待 dispatch 完成,并断言 LlmStreamChunkObserved / LlmRunCompleted payload、actor id、sequence。另加几个 focused tests 直接覆盖 HandleLlmSessionForwardedToolCallEmittedAsync 的新增分支即可翻转 verdict。

📎 完整 codex 原始输出(存档备查)

pr: 2289
role: tests
head_sha: 4a9d4e0
verdict: reject

Verdict

Reject: the new off-actor LLM recorder path has real production logic that is not covered by the added tests.

Evidence

  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:961 adds the production SelfDispatchingLlmRunSink, including sequence assignment and IActorDispatchPort envelope dispatch at lines 968-1028. The closest actor tests, test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:768 and :806, register a RecordingActorDispatchPort but create it anonymously and never assert dispatched envelopes after releaseStream.SetResult(). test/Aevatar.GAgentService.Tests/Application/LlmRunExecutorTests.cs:16 and :39 assert a separate test-only LlmRunAcceptanceHarness.DispatchingLlmRunSink from test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:163, so they do not exercise the production private sink or prove the actor path dispatches chunk/completed/cancelled facts with the expected sequence numbers.
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:386 adds HandleLlmSessionForwardedToolCallEmittedAsync, the recorder event path for forwarded tool calls. The existing matching assertions still cover HandleRecordForwardedToolCallAsync (test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:117) or the old in-actor run path (test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1192), and git grep shows no test invokes HandleLlmSessionForwardedToolCallEmittedAsync. That leaves the new event handler branches for first persist, duplicate idempotency, mismatched duplicate rejection, and terminal-session ignore untested.
  • I found no new Task.Delay, Thread.Sleep, WaitUntilAsync, [Skip], or manual-test trait additions in the changed test files, and no polling allowlist change.

What would change your verdict

Add tests that exercise the production actor/off-turn path directly: keep a reference to the registered RecordingActorDispatchPort, release the gated provider, deterministically wait for dispatched envelopes without sleep, and assert LlmStreamChunkObserved/LlmRunCompleted payloads, actor id, and sequences from SelfDispatchingLlmRunSink. Also add focused tests for HandleLlmSessionForwardedToolCallEmittedAsync covering new call persistence and duplicate/terminal behavior.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:reject

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 质量复审通过

TL;DR

  • 这是什么: PR 完善 LLM 流式运行的 actor 录制与验收覆盖 #2289 的 readability / simplicity 质量复审。
  • 现在到哪一步 / 结论是什么: 质量轴 approve,未发现命名、死代码、过度抽象、复杂度或无关改动 blocker。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可继续汇总其他 reviewer 结论。

质量复审结论

本轮只看代码质量角度,不评 CLAUDE 架构条款和测试覆盖充分性。

我检查了三点:

  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:337 新增的 recorder handler 都用 typed fact 命名,例如 HandleLlmStreamChunkObservedAsync,能直接表达业务事件语义。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:809 把 run fact 接受条件收敛到 CanPersistRunFact,没有把 sequence / terminal 判断散落到每个 handler。
  • test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:12 新增测试 harness 有多个直接调用方,用于集中 acceptance fixture;没有发现新增不可达 helper 或单独扩大生产抽象。

HOST_REFACTOR_COMMENT_POLICY 为空,按规则归一为 none;本 diff 没有新增 Refactor / Old pattern / New principle 这类源码自述注释,也没有新增硬编码 host placeholder 文案。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: quality
head_sha: 4a9d4e0bbb5715859499c3c2ad13168a53a80c90
verdict: approve
---

## Verdict
approve - the PR stays readable and focused for the LLM stream recording acceptance work, with no quality-axis blocker found.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:337` adds typed recorder handlers (`HandleLlmStreamChunkObservedAsync`, `HandleLlmToolCallObservedAsync`, terminal handlers) whose names express the business facts being recorded rather than generic helper/manager wording.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:809` keeps the run-fact acceptance logic in a small, named predicate (`CanPersistRunFact`) instead of spreading sequence/terminal checks across each handler.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:961` introduces `SelfDispatchingLlmRunSink` as a private nested implementation with a single production caller at `LlmSessionGAgent.cs:331`; the abstraction surface is not expanded.
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:12` centralizes the new acceptance fixtures and helper sinks used by the new tests, and the new helper types have direct callers in `LlmRunCoreTests`, `LlmRunExecutorTests`, and `LlmRunEndToEndAcceptanceTests`.
- `docs/canon/llm-streaming.md:38` and `docs/canon/nyxid-responses-direct.md:120` document the stream execution boundary in the existing canon docs, with no unrelated drive-by cleanup in the inspected diff.
- Refactor comment policy: `${HOST_REFACTOR_COMMENT_POLICY}` was empty and therefore normalized to `none`; the diff adds no new `Refactor` / `Old pattern` / `New principle` source-code comments or hardcoded host placeholder text.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 PR 2289 round 1 fix 已完成

TL;DR

  • 这是什么: 修复本轮 review-gate 的 reject 证据,只处理阻塞项。
  • 现在到哪一步 / 结论是什么: 已移除被拒的后台 Task.Run 执行路径,构建和触达测试通过。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可提交当前改动并重新派发下一轮 reviewer。

修复摘要

LlmSessionGAgent 是拥有 LLM session 事实的 actor;本轮把 live provider stream 的执行边界收回到该 actor 的 command turn 内,删除了无权威状态的后台 Task.Run 回调和进程内 sequence counter。IActorDispatchPort 是 actor 之间发命令的标准通道;之前 reviewer 要求覆盖的自 dispatch sink 已被删除,所以该测试要求在新 diff 中不再成立。

reviewer 分类 处理
architect reject A / 已修 src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:316 现在直接通过 actor-owned sink 执行 ILlmRunCore,该位置决定 run 事实由谁拥有。
tests reject / production self-dispatch sink C / false positive after fix 原需求引用的 sink 已不存在,Task.Run / SelfDispatchingLlmRunSink / _lastSequence 在 PR 相关源码和测试中无匹配。
tests reject / forwarded tool recorder handler A / 已修 test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:921 新增 actor-level 测试,覆盖 forwarded tool recorder handler 的持久化和重复幂等。

文档也同步收口:docs/canon/llm-streaming.md:42docs/canon/nyxid-responses-direct.md:126 现在明确 provider stream 当前由 session actor command turn 拥有;如果未来拆出执行器,必须先建模为 run/session-scoped actor 或 durable runtime component。这些行重要,因为它们是后续实现和 review 判断执行边界的 canon 依据。

验证结果:

dotnet build aevatar.slnx --nologo
# pass, warnings only

dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --nologo
# pass: 907 passed, 0 failed, 0 skipped

bash tools/ci/test_stability_guards.sh
# pass
📎 完整 codex 原始输出(存档备查)
# Fix report for PR 2289 round 1

## Applied
- (A) `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:316`: removed the untracked `Task.Run` off-turn execution path and deleted the process-local `SelfDispatchingLlmRunSink` sequence counter; `HandleLlmRunRequestedAsync` now runs `ILlmRunCore` through the actor-owned `InActorLlmRunSink` in the command turn. This addresses reviewer:architect's evidence #1 and the cited CLAUDE clauses: `Task.Run` callbacks only publish signals, business progression stays in Actor event handling, and cross-turn facts cannot rely on process-local memory.
- (A) `docs/canon/llm-streaming.md:42` and `docs/canon/nyxid-responses-direct.md:126`: updated the canon docs so they no longer authorize off-turn provider stream ownership; the docs now state that future extraction requires a run/session-scoped actor or durable runtime component with owned sequence/watermark, retry, cancellation, and terminal facts. This addresses reviewer:architect's "What would change your verdict" request for a runtime-neutral actor-owned/durable contract.
- (A) `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:921`: added direct actor-level coverage for `HandleLlmSessionForwardedToolCallEmittedAsync`, including duplicate idempotency. This addresses reviewer:tests's evidence #2 for the forwarded-tool recorder handler that remains in the fixed design.
- (A) `test/Aevatar.GAgentService.Tests/Application/LlmRunExecutorTests.cs`: deleted the test file that covered only the now-removed off-turn dispatch sink; `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs` was trimmed accordingly. This is part of the architect fix and avoids preserving coverage for an implementation that CLAUDE rejects.

## Rejected as false positive
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:327` cited by reviewer:tests: the demand to add a retained recording `IActorDispatchPort` test for the production self-dispatch sink is superseded by the applied architecture fix. Evidence: `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:316` now calls `runCore.RunAsync(..., new InActorLlmRunSink(this), ...)`; `rg "Task\.Run|SelfDispatchingLlmRunSink|Interlocked\.Increment|_lastSequence|ConsumeRunOffActorTurnAsync|Off-turn LLM" src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs test/Aevatar.GAgentService.Tests/Application test/Aevatar.GAgentService.Tests/Core test/Aevatar.GAgentService.Tests/TestSupport` returns no matches. The former test-support dispatch sink was removed with `test/Aevatar.GAgentService.Tests/Application/LlmRunExecutorTests.cs`.

## Blocked (cannot fix this round)
- None.

## Build status
- build: pass (`dotnet build aevatar.slnx --nologo`; warnings only)
- tests: pass (`dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --nologo`; 907 passed, 0 failed, 0 skipped)
- guards: pass (`bash tools/ci/test_stability_guards.sh`)
- review-thread completion: not applicable; `/Users/chronoai/Code/aevatar/.refactor-loop/state/review-thread-completion/pr2289.json` does not exist.
- refactor self-doc: not applicable (`HOST_REFACTOR_COMMENT_POLICY=none`); no new source refactor-history comments were added.

## Recommendation for next round
- Expect merge after rerun: reject evidence has been resolved, one tests demand was made false-positive by removing the rejected production path, and there are no blocked demands.

⟦AI:AUTO-LOOP⟧
FIX_DONE:2289:round-1:applied-4:rejected-1:blocked-0

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 质量审查:comment,建议删掉两个未使用测试 helper

TL;DR

  • 这是什么: PR 2289 的 readability / simplicity 角度独立审查。
  • 现在到哪一步 / 结论是什么: production 改动聚焦且可读,但新测试支撑代码有两个未使用 helper surface,给 comment
  • 需要 maintainer 做什么 OR controller 下一步: controller 可继续汇总;作者可选择删除或实际使用这两个 helper。

质量审查意见

我没有看到需要阻塞合并的生产代码复杂度、命名或新增抽象问题;HOST_REFACTOR_COMMENT_POLICY 为空,按规则归一为 none,本轮也没有新增 Refactor / Old / New 源码注释。

发现两个小的可读性问题,都是测试 fixture(测试里复用的构造数据/helper)留下了未使用入口:

  • test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18BuildRunRequest() 是新增 public test helper,但 PR head 中没有调用方。未使用构造器会让后续测试不清楚哪条 setup 路径才是 canonical。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:97CreateActor(..., string? responseId = null) 保留了 responseId 分支,但所有调用方都只传 eventStore。这个 optional 参数目前只是未使用扩展点,让测试 helper 多了一条分支。

建议处理方式:删除 LlmRunAcceptanceHarness.BuildRunRequest(),并移除 CreateActor 的 unused responseId 参数/分支;或者补上真实用例让这两个入口有明确用途。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: quality
head_sha: b4a133d60c545e048152bdccd1743b9251bbf377
verdict: comment
---

## Verdict
Comment: the production change is readable and focused, but the new test support leaves a couple of unused helper surfaces that should be trimmed or used.

## Evidence
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18`: `BuildRunRequest()` is a new public helper on the test harness, but there is no caller in the PR head. Unused fixture builders make later tests guess which canonical setup path is intended.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:97`: `CreateActor(..., string? responseId = null)` has a `responseId` branch, but every caller passes only the event store. That optional parameter is an unused extension point and adds unnecessary branching to a test helper.

## What would change your verdict (only if comment or reject)
Delete `LlmRunAcceptanceHarness.BuildRunRequest()` if it is not needed, and remove the unused `responseId` parameter/branch from `CreateActor`, or update the tests to use both intentionally.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 tests 视角通过:流式运行录制覆盖足够

TL;DR

  • 这是什么: PR 完善 LLM 流式运行的 actor 录制与验收覆盖 #2289 的 tests reviewer 独立结论,只看测试覆盖和测试质量。
  • 现在到哪一步 / 结论是什么: approve,新增测试覆盖取消落盘、recorder command、终态权威性和重放序列行为。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可把本 tests 结论计入 review gate,无需 maintainer 额外动作。

测试覆盖结论

本轮只检查测试质量,不评审生产架构。actor 指仓库内负责串行持有业务事实的执行单元;本 PR 新增的 LlmSessionGAgent recorder command 入口,是把流式 LLM 运行过程中的 chunk/tool/terminal fact 录制进该 actor 状态的路径。

我确认新增测试不是 bump-line-count:

覆盖点 证据 为什么重要
取消后仍要记录取消事实 test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:159 覆盖 LlmRunCoreOperationCanceledException 后用非取消 token 写入 LlmRunCancelled,避免取消路径丢终态。
非终态流被取消 test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:179 TaskCompletionSource 和 cancellable wait 验证 partial chunk 后产生 cancelled terminal,没有 sleep/poll pacing。
duplicate / late recorder facts test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:14:37 验证 duplicate chunk、late chunk、completed 后 failed、cancelled 后 completed 都不会覆盖权威终态。
replay 缺中间序列仍接受 terminal test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:57 覆盖 terminal sequence 放宽规则,确保重放能恢复最终完成态且拒绝后续 late chunk。
duplicate run admission / forwarded tool recorder idempotency test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:768:921 用真实 actor state version 断言 no-op,不是 mock-only pseudo-coverage。

同时扫描 PR diff:没有新增 [Skip][Trait("Category","Manual")]Task.DelayWaitUntilAsyncAssert.True(true)tools/ci/test_polling_allowlist.txt 未改动,因此没有新增不稳定测试例外需要解释。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: tests
head_sha: b4a133d60c545e048152bdccd1743b9251bbf377
verdict: approve
---

## Verdict
Approve: test coverage and quality are adequate for the LLM stream-run actor recording and acceptance changes.

## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:159` (`RunAsync_WhenCancellationTokenIsCancelled_ShouldFlushCancelledFactThroughSink`) asserts the cancelled fact is still recorded with a non-cancelled sink token, directly covering the `LlmRunCore.RunAsync` cancellation-token behavior changed in `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:35`.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:179` (`RunAsync_WhenProviderNeverEmitsTerminalChunk_ShouldRecordCancelledFactWhenCancelled`) uses a `TaskCompletionSource` and cancellable wait, not sleep/poll pacing, and asserts partial stream preservation plus cancelled terminal output.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:14` and `:37` assert concrete actor state outcomes for duplicate chunks, late chunks, duplicate terminal events, failed-after-completed, and completed-after-cancelled paths, covering the new recorder command handlers and terminal-authority branches in `LlmSessionGAgent`.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:57` verifies replay accepts a terminal event despite a missing intermediate recorder sequence and rejects later chunks, covering the relaxed terminal sequence acceptance rule.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:768` and `:921` cover duplicate running run admission and forwarded tool-call recorder idempotency with state-version assertions, not mock-only verification.
- Diff scan found no added `[Skip]`, manual traits, `Task.Delay`, `WaitUntilAsync`, `Assert.True(true)`, or assertion weakening. The polling allowlist was unchanged, so no new stability exception needs justification.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 架构评审 r3:approve

TL;DR

  • 这是什么: PR 2289 的 architect 角度复审,按 CLAUDE.md / AGENTS.md 检查 LLM 流式 run 录制链路。
  • 现在到哪一步 / 结论是什么: approve;未发现新增架构违规,LlmSessionGAgent 这个 session actor(持有单个会话业务状态和命令处理的运行单元)仍是 run fact 的权威记录者。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可把该 architect verdict 计入共识;maintainer 无需架构侧修改。

架构检查摘要

我按实际三点 diff origin/feature/integrate...origin/refactor/iter2276-issue-2276CLAUDE.mdAGENTS.md 和本地 NyxID 源码做了复核。HOST_REFACTOR_COMMENT_POLICY 为空,按规则归一化为 none,所以没有要求新增自解释 refactor 注释;本次 diff 也没有新增 Old/New/iteration 历史注释。

关键 evidence:

  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:322-370 新增的是 LlmStreamChunkObserved / LlmRunCompleted 等 typed recorder event(已建模的 protobuf 事实事件)处理器;这符合“核心语义强类型”,没有把控制流语义塞进通用 bag。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:775-870responseId + runId + sequence 在 actor 内判定并持久化事实;这符合“单一权威拥有者”,没有新增中间层进程内 registry / dictionary 作为事实源。
  • src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:31-43 保留 ChatStreamAsync(LLM provider 的流式调用入口)主链,并把取消落成 LlmRunCancelled typed fact;没有回退到 ChatAsync、Host 临时拼装结果或 query-time replay。
  • docs/canon/llm-streaming.md:38-50docs/canon/nyxid-responses-direct.md:120-133 说明 live provider stream 不能靠 actor self-continuation 恢复;若未来拆出执行器,必须是 run/session-scoped actor 或 durable runtime component,这和 AGENTS.md 的 self continuation / 分布式正确性约束一致。

未发现 .proto 变更、*WriteActor / *ReadActor / *Store 拆分、.refactor-loop 生产 SSOT 下沉、外部仓库改动或兼容空壳。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: architect
head_sha: b4a133d60c545e048152bdccd1743b9251bbf377
verdict: approve
---

## Verdict
approve: no architectural concerns from the CLAUDE.md/AGENTS.md compliance angle; the PR keeps LLM run recording actor-owned and typed.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:322`-`370` adds typed recorder event handlers for stream chunks, tool calls, completion, failure, and cancellation; this maps to `CLAUDE.md` "核心语义强类型:影响业务语义、控制流、稳定读取且仓库内可控的数据,必须建模为 `proto field / typed option / typed sub-message`,禁止塞入通用 bag。" and does not introduce a generic bag or JSON state path.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:775`-`797` gates recorder facts by `responseId + runId + sequence` inside the session actor, and `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:819`-`870` persists the accepted facts through actor domain events. This aligns with `CLAUDE.md` "单一权威拥有者:每个稳定业务事实有唯一 actor 拥有;`committed event store + actor state` 是唯一真相,readmodel 只是查询副本。"
- `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:31`-`43` preserves cancellation as an explicit `LlmRunCancelled` fact and only changes the sink token to `CancellationToken.None`, which supports `CLAUDE.md` "AI 对话主链必须流式化:实时会话入口必须使用 `ChatStreamAsync`" without adding `ChatAsync`, host-side response synthesis, or query-time replay.
- `docs/canon/llm-streaming.md:38`-`50` and `docs/canon/nyxid-responses-direct.md:120`-`133` document why the live provider stream is not represented as resumable self-continuation state and why any future split must be a run/session-scoped actor or durable runtime component. This is consistent with `AGENTS.md` "self continuation 必须事件化" and "本地可用不等于分布式正确" rather than adding a `Task.Run` callback or process-local registry.
- Host-specific inputs: `HOST_REFACTOR_COMMENT_POLICY` was empty, so it normalizes to `none`; the diff adds no new Old/New refactor-history source comments. `HOST_ARCHITECTURE_GREP_CHECKS`, `HOST_PROTO_POLICY`, `AUDIT_PATH`, `IMPLEMENT_SUMMARY_PATH`, and `EXTERNAL_REPOS` were empty, so no extra host checks applied. The diff contains no `.proto` changes, no new `*WriteActor` / `*ReadActor` / `*Store` split, no `.refactor-loop` production configuration movement, and no external repository changes.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 架构评审结论:reject,需拆开 recorder command 与 committed event

TL;DR

  • 这是什么: PR 2289 的 architect 角度独立评审,聚焦 CLAUDE.md / AGENTS.md 架构边界。
  • 现在到哪一步 / 结论是什么: 我判定为 reject,因为新增 recorder handler 把同一批 typed run fact 同时当作入站 actor 消息和已提交领域事件使用。
  • 需要 maintainer 做什么 OR controller 下一步: 请作者把入站 recorder 请求建成独立 command/internal-signal 契约,再由 LlmSessionGAgent 持久化 committed event。

架构阻断点

LlmSessionGAgent 是 LLM response session 的权威 actor;EventHandler 是 actor 收到入站消息后执行业务处理的入口。PR 在 src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:322 起新增了 HandleLlmStreamChunkObservedAsync(LlmStreamChunkObserved observed),并在后续行新增 LlmToolCallObservedLlmRunCompletedLlmRunFailedLlmRunCancelledLlmSessionForwardedToolCallEmittedEvent 的同类 handler。

问题是这些类型本身也是 committed domain event,即 actor 已提交后用于状态转换和投影观察的事实。它们在 src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:408413 被注册为状态转换事件。这样同一个 payload 同时表达“请求 actor 记录一个事实”和“actor 已经提交的事实”,违反了 CLAUDE.md 里的读写分离要求:Command -> Event,以及 Envelope 只做统一消息包络、消息契约必须显式定义语义的要求。

建议改法:为入站 recorder 建独立 typed command/internal-signal,例如 RecordLlmStreamChunkObservedRequestedRecordLlmRunCompletedRequested。handler 接收这些请求后,再持久化现有的 LlmStreamChunkObservedLlmRunCompleted 等 committed event。这样 actor 入站协议和 event store/projection 事实仍然分开。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: architect
head_sha: b4a133d60c545e048152bdccd1743b9251bbf377
verdict: reject
---

## Verdict
reject: the PR adds recorder `EventHandler` entry points that reuse committed run event payloads as inbound actor commands/internal signals, which regresses the command/event semantic boundary.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:322` adds `HandleLlmStreamChunkObservedAsync(LlmStreamChunkObserved observed)`, and lines `332`, `342`, `352`, `362`, and `372` add equivalent handlers for `LlmToolCallObserved`, `LlmRunCompleted`, `LlmRunFailed`, `LlmRunCancelled`, and `LlmSessionForwardedToolCallEmittedEvent`. Those handlers then call `PersistStreamChunkObservedAsync` / `PersistRunCompletedAsync` / related persist methods, while the same payload types are registered as state-transition domain events at `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:408` through `413`. This makes one contract serve both as inbound command/internal signal and committed event. CLAUDE.md clause: "读写分离:`Command -> Event``Query -> ReadModel`;异步完成通过事件通知,不在会话内拼装流程。"
- The same change also weakens the envelope semantic boundary: the actor can now receive an `EventEnvelope` carrying a payload such as `LlmRunCompleted` and persist that same payload as the committed event, without a distinct message contract defining whether the input is a command, internal signal, or already-committed event. CLAUDE.md clause: "`Envelope` 是统一消息包络(`command/reply/signal/event/query`),但是否可持久化、可投影、可观察必须由消息契约显式定义。"

## What would change your verdict
Introduce distinct typed recorder command/internal-signal messages for the inbound side, for example `RecordLlmStreamChunkObservedRequested`, `RecordLlmRunCompletedRequested`, etc., and have `LlmSessionGAgent` translate those into committed `LlmStreamChunkObserved` / `LlmRunCompleted` domain events. Alternatively, remove the inbound recorder handlers if only the actor-owned `ILlmRunSink` path is intended. Update the acceptance tests and docs to preserve that command/event split.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:reject

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 质量复审:comment,建议清理 3 处小问题

TL;DR

  • 这是什么: PR 2289 的 quality 视角复审,聚焦可读性、死代码、简单性。
  • 结论: comment;生产改动整体可读且聚焦,新测试支持里有 2 处未使用 helper/分支,另有 1 处未使用 lambda 参数命名。
  • 下一步: controller 可按 advisory 处理;作者顺手删除未用 helper/分支并把未用参数改成 _ 即可。

质量发现

  • test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18 新增 BuildRunRequest(),但在 PR head 上没有调用方;这是死测试 helper surface,会让后续读者误以为验收 harness 还支持一条未覆盖路径。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:97CreateActor 带可选 responseId,但所有调用都走默认值;第 102 行的非空分支没有覆盖到,保留会增加不必要的阅读分叉。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:842conflict 是委托回调用的 lambda 参数,但函数体不读取它;同样模式也在 855、868 行。改成 _ => ... 可以明确表示参数被有意忽略。
📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: quality
head_sha: b4a133d60c545e048152bdccd1743b9251bbf377
verdict: comment
---

## Verdict
Comment: the production changes are readable and focused, but the new test support leaves a couple of small unused helper surfaces that should be trimmed.

## Evidence
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18` adds `BuildRunRequest()`, but `git grep 'LlmRunAcceptanceHarness.BuildRunRequest' origin/refactor/iter2276-issue-2276` finds no callers. This is dead test helper surface; remove it unless a test should use it.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:97` gives `CreateActor` an optional `responseId` parameter, but all callers use the default path. The non-null branch at line 102 is dead readability overhead; drop the parameter/branch or add the scenario that needs a different response id.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:842` names the `conflict` lambda parameter even though the lambda body does not read it; the same pattern appears at lines 855 and 868. Use `_ => ...` so the required delegate parameter is visibly ignored.

## What would change your verdict (only if comment or reject)
Remove `LlmRunAcceptanceHarness.BuildRunRequest()`, simplify `CreateActor` to the actually used actor id path, and rename the unused conflict lambda parameters to `_`.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

🤖 PR 2289 测试审查通过

TL;DR

  • 这是什么: 针对 PR 2289 的测试覆盖与测试质量审查。
  • 现在到哪一步 / 结论是什么: approve,新增测试覆盖了 LLM 流式运行的 actor recorder、取消 flush、重复/乱序/迟到事件和 replay 行为。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可把本角色结论纳入合并真值表;maintainer 无需因 tests 角色补测。

测试审查结论

本轮只看测试角度。LlmRunCore 是 LLM 流式执行核心,LlmSessionGAgent 是保存响应会话状态的 actor;sink 是记录运行事实的输出接口。PR 新增的测试没有停在“调用不抛异常”,而是检查了持久化顺序、终态权威性、取消后 flush、重复 recorder 命令和 replay 后状态。

关键证据:

  • test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:159 验证取消路径仍用未取消的 sink token 记录 LlmRunCancelled,这覆盖了取消 token 被替换为 CancellationToken.None 的行为。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:179TaskCompletionSource 和 cancellable wait 模拟永不终止的流,断言 partial chunk 被记录且最终是 cancelled,不靠固定 sleep。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:14:37:57 覆盖重复 chunk、late chunk、terminal 后失败/完成事件、缺中间序列的 replay 终态接受。
  • test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:719:768:921 分别覆盖 run-start 持久化顺序、重复 running run admission 幂等、forwarded tool call recorder 幂等。

我也扫了测试 diff: 没有新增 [Skip]、manual trait、Task.DelayWaitUntilAsyncAssert.True(true)、bump-only 断言或 polling allowlist 变更。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: tests
head_sha: b4a133d60c545e048152bdccd1743b9251bbf377
verdict: approve
---

## Verdict
Approve: the PR adds meaningful behavior coverage for the LLM stream-run actor recorder paths and avoids the test-quality regressions on the checklist.

## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:159` (`RunAsync_WhenCancellationTokenIsCancelled_ShouldFlushCancelledFactThroughSink`) asserts the cancelled fact is recorded through a non-cancelled sink token, directly covering the `LlmRunCore.RunAsync` cancellation flush change in `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs`.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:179` (`RunAsync_WhenProviderNeverEmitsTerminalChunk_ShouldRecordCancelledFactWhenCancelled`) uses a `TaskCompletionSource` plus cancellable wait rather than fixed sleep/poll pacing, and asserts partial stream recording plus cancelled terminal output.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:14` and `:37` assert concrete actor state outcomes for duplicate chunks, late chunks, duplicate terminal facts, failed-after-completed, and completed-after-cancelled paths, covering the new recorder command handlers and terminal authority behavior.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:57` verifies replay accepts a terminal event despite a missing intermediate recorder sequence and rejects a later chunk, covering the relaxed terminal sequence acceptance rule.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:719` verifies `LlmRunStartedEvent` is persisted before chunk/completion facts with exact sequence assertions; `:768` verifies duplicate running-run admission is an idempotent no-op; `:921` verifies forwarded tool-call recorder idempotency and persisted call fields.
- Diff scan found no added `[Skip]`, manual traits, `Task.Delay`, `WaitUntilAsync`, `Assert.True(true)`, bump-only tests, or assertion weakening. `tools/ci/test_polling_allowlist.txt` was unchanged, so no new polling exception needs justification.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

📊 状态卡片 — fix 派出

维度
阶段 派出 codex(role=fix)
codex log rebase-resolve-pr2289-r2.log
total wall-clock timeout 5400s(~90 min)
上下文 PR #2289 is conflicting after #2275/#2292 merged into feature/integrate; dispatching scoped rebase resolver for issue #2276 only.
下一步自动会做 1. fix r 完成 marker FIX_DONE:... 2. controller commit + push 3. 派 reviewer r<N+1>
是否需要人介入 ❌ 否(自动推进)

🤖 controller status banner

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

📊 状态卡片 — fix 派出

维度
阶段 派出 codex(role=fix)
codex log rebase-resolve-pr2289-r3.log
total wall-clock timeout 5400s(~90 min)
上下文 Previous rebase resolver r2 exited blocked and left no merge state or durable diff; redispatching r3 resolver for the same #2276/#2289 conflict only.
下一步自动会做 1. fix r 完成 marker FIX_DONE:... 2. controller commit + push 3. 派 reviewer r<N+1>
是否需要人介入 ❌ 否(自动推进)

🤖 controller status banner

⟦AI:AUTO-LOOP⟧

…/iter2276-issue-2276

# Conflicts:
#	src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs
#	test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs
@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 quality review: reject,需删除新增死代码

TL;DR

  • 这是什么: PR 2289 的 readability / simplicity 角度复审。
  • 现在到哪一步 / 结论是什么: reject;主体改动可读且聚焦,但新增了 2 个没有调用点的 production private helper。
  • 需要 maintainer 做什么: 删除这 2 个 helper,或把它们接入真实的 terminal/idempotency 判定路径并补对应测试。

质量发现

src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1236 新增了 CompletionAlreadyRecorded(LlmRunCompleted completed),但我在 PR head 上用 git grep 没找到任何调用点。LlmSessionGAgent 是负责 LLM session actor 状态推进的核心类;这个 helper 表达的 terminal/idempotency 意图已经由 CanPersistRunFact 和 run transition 路径承载,保留未使用分支会让后续审查误判真正的幂等入口。

src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1253 新增了 RunTerminalAlreadyRecorded(string runId),同样没有任何调用点。它是 production dead code,不是测试支撑代码;在这个 1600+ 行 actor 类里继续增加未使用 helper,会降低 run-recording 流程的可读性和可审计性。

需要的改动很小:删除 CompletionAlreadyRecordedRunTerminalAlreadyRecorded,或者把它们接入实际 terminal/idempotency 决策路径,并让测试覆盖对应分支。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: quality
head_sha: 934ed6ccad6de290b1751d017c33e822c49e9437
verdict: reject
---

## Verdict
Reject: the PR is mostly focused and readable, but it introduces unused private production helpers that should not merge as dead code.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1236` defines `CompletionAlreadyRecorded(LlmRunCompleted completed)`, but `git grep` over the PR head finds no caller; the helper duplicates terminal/idempotency intent already expressed by `CanPersistRunFact` and the run transition path.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1253` defines `RunTerminalAlreadyRecorded(string runId)`, but `git grep` over the PR head finds no caller; it is new dead production code in a class that is already large enough that unused helpers make the run-recording flow harder to audit.

## What would change your verdict (only if comment or reject)
Delete `CompletionAlreadyRecorded` and `RunTerminalAlreadyRecorded`, or wire them into the actual terminal/idempotency decision path with tests that exercise those branches.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:reject

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Tests review: approve

TL;DR

  • 这是什么: PR 2289 的测试覆盖与测试质量角度复核。
  • 现在到哪一步 / 结论是什么: approve,新增覆盖能验证 LLM 流式运行的 actor 录制、终态权威、取消刷新与重复事件处理。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可把本角色结论计入合并门禁;maintainer 无需补测试。

测试覆盖结论

本轮只看测试质量,不评价生产架构。结论是测试足够支撑这次 diff:新增测试不是单纯 bump 行数,而是在断言具体业务结果。

关键证据:

  • test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:424 验证 LlmRunCore.RunAsync 在调用方 cancellation token 已取消时,仍会通过 sink 写出 LlmRunCancelled,并且传给 sink 的 token 不是已取消状态;这覆盖本次 CancellationToken.None 的回归点。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:444TaskCompletionSourceWaitAsync(ct) 做确定性取消,不靠 Task.Delay 或 sleep 节奏,验证 never-terminal stream 先记录 partial chunk,再记录 cancelled fact。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:14:37 验证 completion/cancellation 终态在 duplicate、late chunk、冲突 failed/completed 后仍保持权威,断言了 state version、session status、active run status/output/sequence 和 completion 字段。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:57 覆盖 replay 场景:缺失中间 recorder command 时 terminal event 仍可应用,后续 stale chunk 不覆盖终态。
  • test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:802 覆盖重复 active run admission 不产生新状态版本。
  • test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1222 覆盖 forwarded tool-call recorder path,并断言重复 call 不重复持久化。

我也扫描了 touched tests:没有新增 Task.DelayThread.SleepWaitUntilAsync[Skip]、manual trait、Assert.True(true) 或只验证不抛异常的伪覆盖;tools/ci/test_polling_allowlist.txt 没有改动。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: tests
head_sha: 934ed6ccad6de290b1751d017c33e822c49e9437
verdict: approve
---

## Verdict
Test coverage and quality are adequate for the LLM streaming actor recording and acceptance diff.

## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:424` covers the cancellation-regression behavior from `LlmRunCore.RunAsync`: the cancelled fact is flushed through the sink with a non-cancelled sink token, and completed/failed facts remain empty.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:444` uses a deterministic `TaskCompletionSource` and `WaitAsync(ct)` harness to verify a never-terminal stream records partial output and then a cancelled fact; no sleep/delay pacing is introduced.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:14` and `:37` assert terminal completion/cancellation authority across duplicate, late, and conflicting recorder facts by checking state version, session status, active run status/output/sequence, and completion fields.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:57` covers replay behavior for a terminal run event with missing intermediate sequence and a later stale chunk, asserting the terminal state and applied version remain authoritative.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:802` covers duplicate active-run admission by asserting no new state version is persisted.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1222` covers the new forwarded tool-call recorder path and duplicate suppression with concrete call fields and status assertions.
- Branch scan over the touched tests found no new `Task.Delay`, `Thread.Sleep`, `WaitUntilAsync`, `[Skip]`, manual trait, `Assert.True(true)`, or new not-throw-only tests; `tools/ci/test_polling_allowlist.txt` is unchanged.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 架构评审通过

TL;DR

  • 这是什么:PR 完善 LLM 流式运行的 actor 录制与验收覆盖 #2289 的 architect 角度独立评审,重点看 CLAUDE.md / AGENTS.md 架构边界。
  • 现在到哪一步 / 结论是什么:结论是 approve,没有发现会阻塞合并的架构违规。
  • 需要 maintainer 做什么 OR controller 下一步:controller 可以把此 reviewer 结果计入本轮共识。

架构结论

本轮只审架构合规,不评价测试覆盖质量或性能。核心判断是:新增的 typed recorder fact 指 LlmStreamChunkObservedLlmRunCompleted 等强类型运行事实,仍由 LlmSessionGAgent 这个业务 actor 作为唯一权威持久化;没有拆出 WriteActor / ReadActor / StoreActor,也没有把事实状态放进中间层进程内字典。

IActorDispatchPort 是 actor 之间发命令的标准通道;本 diff 没有绕开它新增第二套业务主链路。ChatStreamAsync 是 LLM 流式调用入口;diff 没有新增 ChatAsync 正式主链,也没有新增 Task.Run callback 来持有 provider stream 状态。

检查到的关键证据:

文件位置 为什么重要
src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583 typed run facts 在同一个 LlmSessionGAgent 内处理,符合“一个 actor = 一个业务实体”。
src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1056 responseId + runId + sequence 判定接受事实,没有新增进程内事实注册表。
src/platform/Aevatar.GAgentService.Abstractions/Protos/llm_sessions.proto:278 run 事实已经是 protobuf 强类型消息,不是新增通用 bag。
src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37 取消路径用 CancellationToken.None flush LlmRunCancelled,避免取消 token 抑制 actor-owned terminal fact。
docs/canon/llm-streaming.md:27 / docs/canon/nyxid-responses-direct.md:77 canon 文档同步说明 actor-owned recorder/finalizer 语义,并明确不能用 self-continuation 或 Task.Run 承接 live stream。

我也做了针对性 grep:未发现新增 Old/New refactor-history 源码注释、.refactor-loop/host.env 生产事实、外部仓库依赖、ChatAsync 主链、ConcurrentDictionary/Dictionary<> 中间层事实状态、TypeUrl.ContainsGetAwaiter().GetResult()、端口 5000/5050

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: architect
head_sha: 934ed6ccad6de290b1751d017c33e822c49e9437
verdict: approve
---

## Verdict
approve - no architecture compliance blocker found for the typed LLM run recorder/finalizer changes.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583` adds direct handlers for existing typed run facts (`LlmStreamChunkObserved`, `LlmToolCallObserved`, `LlmRunCompleted`, `LlmRunFailed`, `LlmRunCancelled`) on the same `LlmSessionGAgent` authority. This is consistent with CLAUDE.md "Actor 即业务实体:一个 actor = 一个业务实体(数据与方法同住);禁止按技术功能(读/写/投影)拆分同一业务实体为多个 actor。"
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1056` gates persistence by `responseId + runId + sequence` and terminal status, so the PR does not introduce process-local registries or middle-layer ID maps. This is consistent with CLAUDE.md "事实源唯一:跨请求/跨节点一致性事实必须有唯一权威来源(Actor 持久态或分布式状态),不依赖进程内偶然状态。"
- `src/platform/Aevatar.GAgentService.Abstractions/Protos/llm_sessions.proto:278` already models run facts as protobuf messages with typed `sequence` and `record_id`; the diff does not add schema bags or JSON/XML state storage. This is consistent with CLAUDE.md "核心语义强类型:影响业务语义、控制流、稳定读取且仓库内可控的数据,必须建模为 `proto field / typed option / typed sub-message`,禁止塞入通用 bag。"
- `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37` records cancellation with `CancellationToken.None`, preserving a committed terminal fact instead of letting caller cancellation suppress the actor-owned final state. This is consistent with CLAUDE.md "committed domain event 必须可观察:write-side 完成 committed event 后必须送入 projection 主链;禁止只落 event store 而不进入可观察流。"
- `docs/canon/llm-streaming.md:27` and `docs/canon/nyxid-responses-direct.md:77` update canonical documentation for the actor-owned run recording boundary and explicitly reject self-continuation or `Task.Run` ownership of the live provider stream. This satisfies CLAUDE.md "架构调整 PR 必须同步相关 `docs/`,并在验证结果中列出 build/test/guard。"
- Scope and host-boundary checks passed from the diff: changed files are limited to LLM run source/tests/canon docs; no `.refactor-loop/host.env` or production host facts were introduced; no new external repo references or `*WriteActor` / `*ReadActor` / `*StoreActor` split appeared; no new Old/New refactor-history source comments appeared while `HOST_REFACTOR_COMMENT_POLICY` is empty/default `none`.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 测试视角通过:LLM 流式 actor 录制覆盖充分

TL;DR

  • 这是什么: PR 2289 的 tests reviewer 结论。
  • 现在到哪一步 / 结论是什么: approve,新增测试覆盖了 actor 邮箱录制、终态幂等、取消 flush 和重复/迟到事件。
  • 需要 maintainer 做什么 OR controller 下一步: 无测试侧阻塞,controller 可继续合并门禁流程。

测试覆盖证据

本轮只看测试质量和覆盖,不评价生产架构。LlmStreamChunkObserved / LlmToolCallObserved / LlmRunCompleted 是 LLM 流式运行过程中写入 actor 的强类型运行事实;相关测试通过真实 actor 状态断言结果,而不是只验证 mock 调用。

  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:15 覆盖完成态权威性:重复 chunk、迟到 chunk、重复 completion、迟到 failure 都不能覆盖已完成结果,这对 actor 录制链路很关键。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:37 覆盖取消态权威性:取消后继续到达的 chunk/completion/cancel flush 被忽略,避免取消结果被迟到流事件改写。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:57 覆盖 replay 场景:即使缺少中间 recorder command,terminal event 仍能落地,后续 stale chunk 不会回写。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:424 覆盖 LlmRunCore.RunAsync 取消分支:取消事实用未取消的 token 写入 sink,避免原请求 token 已取消导致取消事实丢失。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:445 覆盖 provider 永不发 terminal chunk 时的取消行为:先断言 partial chunk,再断言 cancelled fact,没有用 sleep/polling 节奏撑测试。
  • test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:803 覆盖重复 run admission:同一个 running run 再次请求时 LastAppliedEventVersion 不变。
  • test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:921test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1223 覆盖 actor-owned sequence、record idempotency、forwarded tool call duplicate handling,都是业务结果断言。
  • 对本 PR touched files 的 hygiene scan 未发现新增 [Skip][Trait("Category", "Manual")]Task.DelayThread.SleepWaitUntilAsyncAssert.True(true);唯一匹配的 Should().NotThrowAsync 是既有代码,不在本次 diff。
📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: tests
head_sha: 934ed6ccad6de290b1751d017c33e822c49e9437
verdict: approve
---

## Verdict
Approve: the diff adds behavior-level tests for the new LLM run recorder paths, terminal idempotency, cancellation flushing, and duplicate/late event handling, with no added skip/manual/polling-test shortcuts.

## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:15` asserts a completed run remains authoritative across duplicate chunks, late chunks, duplicate completion, and late failure; this covers the mailbox recorder path for `LlmStreamChunkObserved`, `LlmToolCallObserved`, and `LlmRunCompleted`.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:37` asserts cancellation remains authoritative and ignores later chunk/completion/cancel flushes; this covers terminal cancellation behavior instead of just checking that handlers return.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:57` asserts replay accepts a terminal event despite a missing intermediate recorder command while rejecting the later stale chunk, covering the terminal sequence exception.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:424` asserts cancelled runs are flushed through the sink with `CancellationToken.None`, directly covering the `LlmRunCore.RunAsync` cancellation-token change.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:445` asserts a never-terminal provider records a partial chunk and then a cancelled fact after cancellation, using `TaskCompletionSource` plus cancellation rather than sleep-based pacing.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:803` asserts duplicate admission of an already-running run is ignored without changing `LastAppliedEventVersion`, covering the new same-active-run guard.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:921` and `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1223` assert actor-owned sequence/idempotency and forwarded tool-call duplicate handling with real actor state, not mock-only verification.
- Hygiene scan over the touched files found no added `[Skip]`, `[Trait("Category", "Manual")]`, `Task.Delay`, `Thread.Sleep`, `WaitUntilAsync`, or `Assert.True(true)` patterns; the only `Should().NotThrowAsync` match is pre-existing at `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:99` and is outside this PR diff.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Quality review: reject,需删除未使用的 actor helper

TL;DR

  • 这是什么: PR 2289 的 readability / simplicity / dead-code 角度 review。
  • 现在到哪一步 / 结论是什么: reject,因为生产代码新增了 2 个未被调用的 private helper。
  • 需要 maintainer 做什么 OR controller 下一步: 请删除或接入这两个 helper 后再让 controller 重跑 review。

具体问题

LlmSessionGAgent 是持有 LLM session 权威状态的 actor;这里的 actor 指按顺序处理命令并持久化 session 事实的业务实体。PR 新增的两个 helper 位于这个核心状态流旁边,但没有任何调用方,会让读者误以为 terminal/completion 判定还有另一条实际路径。

  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1236 新增 CompletionAlreadyRecorded(LlmRunCompleted completed),但在 PR head 上 git grep 只找到声明本身。这个函数表达了 completion 幂等判定,却没有参与实际 run-finalization 流程,属于生产死代码。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1253 新增 RunTerminalAlreadyRecorded(string runId),同样只有声明没有调用方。它和现有 terminal-state 判断并排存在,会增加状态机阅读成本。

建议处理方式: 如果这两个 helper 是前一版方案遗留,直接删除;如果确实是目标设计的一部分,请把它们接入 run fact acceptance path,并补上覆盖该分支的测试。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: quality
head_sha: 934ed6ccad6de290b1751d017c33e822c49e9437
verdict: reject
---

## Verdict
Reject: the PR introduces production dead code in the central LLM session actor.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1236` adds private helper `CompletionAlreadyRecorded(LlmRunCompleted completed)`, but `git grep` over the PR branch finds no caller outside the declaration. It duplicates terminal/completion acceptance intent without participating in the actual flow, making the run-finalization logic harder to read.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1253` adds private helper `RunTerminalAlreadyRecorded(string runId)`, but `git grep` over the PR branch finds no caller outside the declaration. This is another unused terminal-state helper next to the live persistence path.

## What would change your verdict (only if comment or reject)
Delete `CompletionAlreadyRecorded` and `RunTerminalAlreadyRecorded`, or wire them into the run fact acceptance path with tests showing the intended branch. If deletion is correct, no behavior change is needed.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:reject

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 架构审查通过:未发现 CLAUDE/AGENTS 回归

TL;DR

  • 这是什么: PR 2289 的 architect 角色独立审查结果。
  • 现在到哪一步 / 结论是什么: approve,未发现必须阻塞合并的架构规则回归。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可把本轮 architect 票计为通过。

架构证据

LlmSessionGAgent 是 response session 的业务 actor,也就是持有一次 LLM response/run 权威状态的对象;本 PR 仍把 run chunk、tool、terminal 事实收敛到这个 actor 内持久化,没有拆出 *WriteActor / *ReadActor / *Store 形式的第二套实体。

ChatStreamAsync 是模型输出的流式入口;本 PR 继续走该流式主链,并把取消收尾落成 typed LlmRunCancelled 事实,没有回退到 Host 临时拼响应或 query-time replay。

证据 为什么重要
src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583 actor 自己接收 recorder facts 并转为持久事实,符合“Actor 即业务实体”和“单一权威拥有者”。
src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1056 按 session、run id、terminal 状态和 sequence 做接收门禁,未新增服务级进程内事实 map。
src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37 取消后仍用 CancellationToken.None 刷入 LlmRunCancelled,避免取消 token 阻断最终事实记录。
docs/canon/llm-streaming.md:40 / docs/canon/nyxid-responses-direct.md:106 canon 文档同步写清 Host 只观察 terminal projection、session actor 拥有 run facts 的边界。

本轮 host policy 变量为空:Old/New source comment policy 归一化为 none,未发现新 Old/New/refactor-history 源码注释;没有 proto diff,也没有把 host 生产事实搬进 .refactor-loop/

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: architect
head_sha: 934ed6ccad6de290b1751d017c33e822c49e9437
verdict: approve
---

## Verdict
approve - no architecture compliance regression found in the changed C# / proto review surface or the related canon documentation updates.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583` keeps recorder facts inside the existing `LlmSessionGAgent` owner and persists them through actor-owned methods, matching `CLAUDE.md` clauses "Actor 即业务实体:一个 actor = 一个业务实体" and "单一权威拥有者:每个稳定业务事实有唯一 actor 拥有".
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1056` gates incoming run facts by registered session, terminal state, run id, and sequence before persistence; no new service-level `Dictionary<>` / `ConcurrentDictionary<>` fact map or read/write actor split was introduced.
- `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:101` continues to use `ChatStreamAsync` for the live LLM path, and `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37` records cancellation as typed `LlmRunCancelled` even after the provider token is cancelled, aligning with the AGENTS rule that AI conversation main paths must be stream-first and terminal facts must be eventized.
- `docs/canon/llm-streaming.md:40` and `docs/canon/nyxid-responses-direct.md:106` document the same actor-owned run recording boundary: Host observes terminal projection, `LlmSessionGAgent` owns `responseId + runId + sequence`, and future execution split must be run/session-scoped or durable. This is consistent with `CLAUDE.md` "Host 只做协议适配与依赖组合" and "查询始终走 readmodel".
- Old/New pattern policy: `HOST_REFACTOR_COMMENT_POLICY` was empty, so it normalizes to `none`; grep of the net C# / proto diff found no new Old/New or refactor-history source comments. No proto files changed, `HOST_PROTO_POLICY` was empty, and no host production SSOT facts were moved into `.refactor-loop`.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 PR 2289 round 1 reject fix 已完成

TL;DR

  • 这是什么: 针对 review-gate round 1 的唯一 reject 做修复。
  • 现在到哪一步 / 结论是什么: 已删除死代码,并修正 typed recorder fact 的持久化与 terminal sequence 语义;本地 build/test/guard 均通过。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可以提交当前 worktree 并重新派发 reviewers。

修复摘要

本轮只有 reviewer:qualityrejectreviewer:architectreviewer:tests 均为 approve,所以 blocking demand 只来自 quality 的死代码证据。

  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1245 删除了未被调用的 CompletionAlreadyRecordedRunTerminalAlreadyRecorded。这两个 helper 是 private production code,git grep 已确认在生产和测试路径中无剩余引用。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583 修正 typed recorder event handler:typed recorder fact 指 LlmStreamChunkObserved / LlmRunCompleted 这类 actor 已收到的强类型事实;现在直接持久化已验收 fact,不再绕回 live sink helper 重新计算 sequence。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:850 修正 terminal finalizer 行为:terminal finalizer 指 completed/failed/cancelled 这类最终状态事件;它允许单调 sequence gap 结束 run,非 terminal chunk/tool 仍要求严格下一个 sequence。

验证结果

命令 结果
dotnet build aevatar.slnx --nologo pass
dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --nologo pass, 953 passed / 0 failed / 0 skipped
bash tools/ci/test_stability_guards.sh pass

review-thread-completion seed 文件不存在,所以没有 GitHub review thread 需要回复或 resolve。

📎 完整 codex 原始输出(存档备查)
# Fix report for PR 2289 round 1

## Non-blocking reviewer context
- reviewer:architect approved. Evidence cited actor-owned typed run facts, `responseId + runId + sequence` gating, protobuf typed contracts, cancellation terminal fact flushing, docs updates, and no host/config scope violation.
- reviewer:tests approved. Evidence cited cancellation/no-terminal coverage, deterministic `TaskCompletionSource` usage, terminal duplicate/late/conflicting recorder tests, replay tests, duplicate run admission, forwarded tool-call recorder coverage, and no new polling/skip/no-op tests.

## Applied
- (A) src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1245: removed unused private helpers `CompletionAlreadyRecorded` and `RunTerminalAlreadyRecorded`; `git grep` over `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs` and `test/Aevatar.GAgentService.Tests` now returns no matches (addresses reviewer:quality's evidence #1 and #2).
- (A) src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583: typed recorder event handlers now persist the accepted typed fact directly instead of routing through live sink helpers that recompute sequence; this keeps `responseId + runId + sequence` facts authoritative and makes the PR's new acceptance tests pass.
- (A) src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:850: terminal transitions now allow a monotonic sequence gap for completed/failed/cancelled facts while non-terminal chunk/tool records still require the exact next sequence; this aligns transition replay with `CanPersistRunFact(... terminal: true)` and the documented terminal-finalizer behavior.

## Rejected as false positive
- None.

## Blocked (cannot fix this round)
- None.

## Build status
- build: pass (`dotnet build aevatar.slnx --nologo`)
- tests: pass (`dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --nologo`, 953 passed, 0 failed, 0 skipped)
- guard: pass (`bash tools/ci/test_stability_guards.sh`)
- review-thread-completion: not applicable; `/Users/chronoai/Code/aevatar/.refactor-loop/state/review-thread-completion/pr2289.json` is absent.

## Recommendation for next round
- expect unanimous; the only reject evidence was addressed, and validation passes.

⟦AI:AUTO-LOOP⟧
FIX_DONE:2289:round-1:applied-3:rejected-0:blocked-0

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Quality review: comment on small readability cleanup

TL;DR

  • 这是什么: PR 2289 的 code quality 视角复核,按 prompt 固定 head 11c49ae52f854110e518dbbb5236f136e5daa72e 查看。
  • 现在到哪一步 / 结论是什么: verdict 是 comment,没有发现必须阻塞的生产代码问题,但有 2 类小清理点。
  • 需要 maintainer 做什么: 建议作者清掉未使用的 test fake surface,并把重复 recorder handler 收敛成一个私有 helper。

质量发现

LlmSessionGAgent 是负责持有 response session 权威运行状态的 actor,本次新增的 5 个 recorder event handler 在 src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:582 附近重复了同一套门禁形状:判空、CanPersistRunFact、clone 后持久化、否则返回 completed task。这个重复已经超过清单里的 >=3 近似复制阈值;抽一个私有 helper 可以保留 terminal / non-terminal 调用点的显式性,同时降低后续改门禁时漏改的概率。

ILlmRunSink 是 run core 把 chunk / terminal fact 写回 actor 的 sink 契约;在本次固定 head 下它没有 RecordForwardedToolCallEmittedAsync。因此 test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:664 的同名 fake 方法没有接口入口或测试调用,是死测试辅助面。

test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18BuildRunRequest() 当前没有调用点;同文件 RecordingLlmRunSink.ForwardedToolCallsRecordForwardedToolCallEmittedAsync 也没有从契约或测试路径到达。建议删除,除非补一个真实断言使用它们。

📎 完整 codex 原始输出(存档备查)

pr: 2289
role: quality
head_sha: 11c49ae
verdict: comment

Verdict

Comment: the code is mostly readable and focused, but it introduces small dead test-helper surfaces and repeated recorder-handler boilerplate that should be tightened.

Evidence

  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:582: the five new HandleLlm*Observed/Completed/Failed/CancelledAsync handlers repeat the same ThrowIfNull -> CanPersistRunFact -> PersistDomainEventAsync(...Clone...) -> Task.CompletedTask shape at lines 582, 592, 602, 612, and 622. This crosses the checklist's >=3 near-identical copies threshold; a small shared helper would keep the actor recorder gate easier to scan.
  • test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:664: CancellationSensitiveLlmRunSink.RecordForwardedToolCallEmittedAsync is not part of ILlmRunSink in this head and has no caller, so it is dead test-fake surface.
  • test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18: BuildRunRequest() is added to the harness but has no caller in the PR diff; RecordingLlmRunSink.ForwardedToolCalls / RecordForwardedToolCallEmittedAsync at lines 119 and 140 are likewise not reachable from ILlmRunSink or the tests.

What would change your verdict (only if comment or reject)

Extract one private helper for the five typed recorder event handlers while keeping terminal vs non-terminal explicit at the call sites, and delete the unused harness/test-fake members unless a test actually asserts them.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 tests reviewer:PR 2289 测试角度 approve

TL;DR

  • 这是什么: 我按测试覆盖与测试质量口径复核了 PR 2289 的 LLM 流式 actor 录制与验收覆盖。
  • 现在到哪一步 / 结论是什么: 结论是 approve,新增测试覆盖了取消落盘、terminal 权威性、重复/晚到事实、replay sequence gap 与 forwarded tool 去重。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可把 tests 角色视为通过;本角色没有 must-fix 测试缺口。

测试覆盖结论

LlmRunCore.RunAsync(LLM 执行核心入口)的取消路径已有行为测试:取消后仍通过未取消的 sink token 写入 LlmRunCancelled,并断言不会产生 completed/failed terminal fact。

LlmSessionGAgent(持有 response session 权威状态的 actor)的新增 recorder handler 已有验收矩阵:completion/cancel/failure terminal 事实成为权威后,重复 chunk、晚到 chunk、重复 terminal dispatch 和后续失败/取消都不会覆盖已提交状态。replay 场景也覆盖了 terminal event 跨过丢失中间 recorder command 的 sequence gap。

测试卫生没有发现新增 [Skip]、manual category trait、Task.DelayThread.SleepWaitUntilAsyncAssert.True(true) 或把强断言改弱。异步等待使用 TaskCompletionSource(测试中用于确定流已进入的同步信号)和 cancellation,不属于 sleep pacing。

我本地执行了聚焦测试:

dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --filter "FullyQualifiedName~LlmRunCoreTests|FullyQualifiedName~LlmRunEndToEndAcceptanceTests|FullyQualifiedName~LlmSessionGAgentTests" --nologo

结果:47 passed, 0 skipped。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: tests
head_sha: 941790febd654684357c03802b35d4ee6ff81c24
verdict: approve
---

## Verdict
Test coverage and quality are adequate for the LLM streaming actor recording and acceptance changes.

## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:425` covers the cancellation-token regression in `LlmRunCore.RunAsync`: the sink records `LlmRunCancelled` with a non-cancelled token and no completed/failed terminal fact.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:445` covers a live provider stream that never emits a terminal chunk, using `TaskCompletionSource` plus cancellation instead of sleep-based pacing.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:15` exercises duplicate chunk suppression, terminal completion authority, late chunks, repeated terminal dispatch, and later failed facts against the actor state.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:38` and `:58` cover cancelled and failed terminal authority, including ignored flushes or later terminal facts.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:80` covers replay accepting a terminal event despite a missing intermediate recorder command, matching the new terminal sequence-gap behavior.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:803` covers duplicate running run admission as a no-op, and `:1223` covers forwarded tool-call recorder persistence plus duplicate suppression.
- Hygiene checks found no added `[Skip]`, manual category trait, `Task.Delay`, `Thread.Sleep`, `WaitUntilAsync`, `Assert.True(true)`, or weakened assertions in the touched test diff. The focused command passed: `dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --filter "FullyQualifiedName~LlmRunCoreTests|FullyQualifiedName~LlmRunEndToEndAcceptanceTests|FullyQualifiedName~LlmSessionGAgentTests" --nologo` returned 47 passed, 0 skipped.

## What would change your verdict (only if comment or reject)
N/A.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 architect review approve:未发现架构阻塞项

TL;DR

  • 这是什么: PR 2289 的 architecture compliance 复核。
  • 现在到哪一步 / 结论是什么: 从 architect 角度 approve,没有发现 CLAUDE.md / AGENTS.md 退化。
  • 需要 controller 下一步: 可把本角色结果计入共识;无需作者在架构项上补改。

架构复核结论

本次只按 architect 角色看架构边界,不评价测试质量或性能。

  • src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37: 取消路径改用 CancellationToken.None 写入取消终态,意义是避免调用方已经取消的 token 抑制 actor 应持有的终态事实。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:582: 新增 typed recorder facts(用 Protobuf 类型表达的录制事实)处理入口,但仍通过 CanPersistRunFact 校验并由 PersistDomainEventAsync 提交,意义是权威状态仍归 session actor,不是中间层缓存事实。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:632: forwarded tool call 仍记录在既有 session actor 内,并对重复事实做一致性校验,意义是没有新增并行 store/shim 或读写 actor 拆分。

已检查:无 .proto diff、无新增 *WriteActor / *ReadActor / *Store 实体拆分、无新增外部 repo 引用、无 .refactor-loop 承载 host 生产事实、无新增 Old/New refactor-history source comments。Host 专项 grep/policy 变量为空,因此未额外发明 host-specific 规则。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: architect
head_sha: 941790febd654684357c03802b35d4ee6ff81c24
verdict: approve
---

## Verdict
approve - no architecture compliance concerns found in the reviewed diff.

## Evidence
- No blocking architecture findings. `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37` only changes cancellation-terminal recording to use `CancellationToken.None`, which preserves the actor-owned cancelled fact instead of letting the caller's cancelled token suppress it.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:582` adds typed observed-fact handlers that gate through `CanPersistRunFact` and then commit via `PersistDomainEventAsync`, keeping the session actor as the authoritative owner rather than adding a middle-layer registry or read/write actor split.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:632` records forwarded tool-call facts inside the existing session actor, with duplicate fact validation instead of a parallel store/shim.
- Policy checks: no `.proto` diff, no new `*WriteActor` / `*ReadActor` / `*Store` entity split, no new external repo reference, no `.refactor-loop` host SSOT movement, no new Old/New refactor-history source comments, and no host-specific grep checks were configured.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 PR 2289 quality review: comment

TL;DR

  • 这是什么: 我完成了 PR 2289 的 readability / simplicity 质量复审。
  • 结论是什么: verdict 是 comment,仅发现 1 个小的死代码测试辅助方法。
  • 需要 maintainer 做什么: 删除未使用的 helper,或补一个实际使用它的验收测试。

质量发现

test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18 新增的 BuildRunRequest() 在 PR head 里没有任何调用点;我用 git grep 检查时只找到定义本身。这个 helper 是测试辅助 surface,但目前没有测试或生产代码使用它,会增加读者判断成本,也容易让后续测试误以为这里是共享入口。

生产代码里新增的 typed recorder event handlers 可读性基本稳定:typed recorder event 指 actor 记录 LLM run chunk/tool/terminal 的强类型事件;这些 handler 的重复形态来自不同 [EventHandler] 签名,没有扩成额外单实现接口或复杂抽象。

建议改动很小:删除 LlmRunAcceptanceHarness.BuildRunRequest(),或者补一个实际用它构建 run request 的 acceptance test。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: quality
head_sha: 941790febd654684357c03802b35d4ee6ff81c24
verdict: comment
---

## Verdict
Comment: the production change is readable and focused, but the new shared test harness exposes a helper that is not used anywhere in the PR tree.

## Evidence
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18` defines `BuildRunRequest()`, but `git grep` on the PR head only finds this definition and no callers. This adds dead test-support surface; the acceptance tests use `BuildRecord`, `Chunk`, `Completed`, `Failed`, `Cancelled`, `LocalToolCall`, `StateEvent`, `RecordingLlmRunSink`, and `NeverTerminalLlmProviderFactory`, but not this request builder.

## What would change your verdict (only if comment or reject)
Delete `LlmRunAcceptanceHarness.BuildRunRequest()` or add a concrete acceptance test that uses it.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 架构审查:reject,文档边界与代码不一致

TL;DR

  • 这是什么: PR 完善 LLM 流式运行的 actor 录制与验收覆盖 #2289 的 architect / CLAUDE.md compliance 审查结果。
  • 结论是什么: 当前阻塞点是新增 canon 文档把 provider stream 写成在 session actor command turn 内消费,但代码走的是 LlmRunExecutionGAgent / ILlmRunExecutor 的 off-turn 执行路径。
  • 需要 maintainer 做什么: 请先把两处 canon 文档改成和实际代码路径一致,controller 再重新触发 architect review。

审查结论

我给 reject,范围很窄:不是说当前代码一定要改回 actor turn 内执行,而是新增架构文档必须准确描述现在的执行边界。

IActorDispatchPort 是 actor 之间投递命令的标准通道。当前代码里 LlmSessionGAgent 负责 run 状态、sequence 接受和 terminal fact 权威性;LlmRunExecutionGAgent / ILlmRunExecutor 在 session actor command turn 外消费 live provider stream,然后通过 IActorDispatchPort 把 typed Record* recorder command 发回 session actor。新增文档却写成 ILlmRunCore 在同一 session actor command turn 内连续消费 ChatStreamAsync,这会让后续维护者误判 ownership / turn boundary。

位置 为什么重要
docs/canon/llm-streaming.md:43 写明 live provider ChatStreamAsync / IAsyncEnumerable<LLMStreamChunk>ILlmRunCore “在同一 actor command turn 内连续消费”。
docs/canon/nyxid-responses-direct.md:122 / :127 同样写明 provider stream 当前由 session actor command turn 拥有。
src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:411-:417 实际代码记录 LlmRunExecutionReadyEvent 后调度 transient execution,不是在该 turn 内直接消费 provider stream。
src/platform/Aevatar.GAgentService.Core/GAgents/LlmRunExecutionGAgent.cs:22-:35 实际执行入口把 ExecuteLlmRunRequested 委托给 ILlmRunExecutionService
src/platform/Aevatar.GAgentService.Application/Responses/LlmRunExecutor.cs:82-:99 LlmRunExecutorrunCore.RunAsync(...),并使用 DispatchingLlmRunSink 回写 recorder command。
src/platform/Aevatar.GAgentService.Application/Responses/LlmRunExecutor.cs:145-:163 recorder command 通过 IActorDispatchPort dispatch 回 session actor。

对应项目规则是:CLAUDE.md 要求“变更必须可验证:架构调整需同步文档,且 build/test 通过。”以及“架构调整 PR 必须同步相关 docs/,并在验证结果中列出 build/test/guard。”这里不是缺文档,而是文档和实际代码边界不同步。

建议修法:把 docs/canon/llm-streaming.mddocs/canon/nyxid-responses-direct.md 改成:LlmSessionGAgent 是唯一 session/run 状态权威;LlmRunExecutionGAgent / ILlmRunExecutor 在 session actor command turn 外消费 live provider stream;sink 通过 typed Record* recorder command 回到 session actor;保留 self-continuation 不能恢复 live HTTP stream 的 rationale,但不要写成当前仍在 session actor command turn 内消费。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: architect
head_sha: 941790febd654684357c03802b35d4ee6ff81c24
verdict: reject
---

## Verdict
Reject: the new canon documentation describes the LLM provider stream as consumed in the session actor command turn, but the PR head code schedules an off-session execution actor/service path and dispatches recorder commands back to the session actor.

## Evidence
- `docs/canon/llm-streaming.md:43` says live provider `ChatStreamAsync` / `IAsyncEnumerable<LLMStreamChunk>` is consumed by `ILlmRunCore` "在同一 actor command turn 内连续消费", and `docs/canon/nyxid-responses-direct.md:122` / `:127` say the provider stream is owned by the session actor command turn. In the actual PR head, `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:411`-`:417` records `LlmRunExecutionReadyEvent` and schedules transient execution, `src/platform/Aevatar.GAgentService.Core/GAgents/LlmRunExecutionGAgent.cs:22`-`:35` delegates `ExecuteLlmRunRequested` to `ILlmRunExecutionService`, and `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunExecutor.cs:82`-`:99` calls `runCore.RunAsync(...)` with a `DispatchingLlmRunSink` that dispatches `Record*` commands back through `IActorDispatchPort` at `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunExecutor.cs:145`-`:163`. This violates `CLAUDE.md` "变更必须可验证:架构调整需同步文档,且 `build/test` 通过。" and "架构调整 PR 必须同步相关 `docs/`,并在验证结果中列出 build/test/guard。"

## What would change your verdict
Update `docs/canon/llm-streaming.md` and `docs/canon/nyxid-responses-direct.md` to match the actual PR architecture: `LlmSessionGAgent` owns run state, sequence acceptance, and terminal facts; `LlmRunExecutionGAgent` / `ILlmRunExecutor` consumes the live provider stream off the session actor command turn; the sink dispatches typed `Record*` recorder commands back to the session actor via `IActorDispatchPort`. Keep the self-continuation rationale, but word it against that actual execution boundary.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:reject

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 tests reviewer 结论:approve

TL;DR

  • 这是什么: PR 2289 的测试质量与覆盖角度独立 review。
  • 现在到哪一步 / 结论是什么: approve,新增测试覆盖了取消落账、actor recorder typed fact、重复 admission、终态权威和 replay gap。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可以把 tests 角色视为通过;无需 maintainer 补测试。

测试覆盖证据

  • test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:424 覆盖 LlmRunCore.RunAsync 的取消路径;这里重要是它断言 cancelled fact 会用未取消的 sink token 写出,且不会误写 completed/failed。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:444 覆盖流式运行中途取消;测试用 TaskCompletionSource 和协作取消同步进度,没有用 Task.Delay 或轮询 pacing。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:14 通过真实 actor HandleEventAsync 验证重复 chunk、completion 终态权威、late chunk 和后续 failed fact 被忽略。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:37:57 验证 cancelled/failed 终态权威,后续 flush 或其他 terminal fact 不会覆盖当前业务结果。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:79 覆盖 replay 时 terminal event 即使跳过中间 sequence 仍可应用,对应本 PR 的 terminal gap acceptance 分支。
  • test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:802 验证重复 running run admission 不会重复推进或重新 dispatch。
  • test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1222 覆盖 HandleLlmSessionForwardedToolCallEmittedAsync,即 actor 记录 forwarded tool call 的事件入口,验证首次持久化和重复事件忽略。
  • 我还扫描了 diff:没有新增 [Skip]、manual trait、Task.DelayThread.SleepWaitUntilAsync,也没有明显 assertion loosening;本地执行 bash tools/ci/test_stability_guards.sh 通过。
📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: tests
head_sha: 941790febd654684357c03802b35d4ee6ff81c24
verdict: approve
---

## Verdict
Approve: the PR adds behavior-focused coverage for the new LLM run cancellation flush, actor recorder handlers, duplicate admission, terminal precedence, and replay gap behavior without weakening test quality.

## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:424` covers the changed cancellation path in `LlmRunCore.RunAsync` and asserts the cancelled fact is flushed through the sink with a non-cancelled sink token, plus no completed/failed facts.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:444` covers mid-stream cancellation using `TaskCompletionSource` and cooperative cancellation instead of sleep/polling; it asserts the partial chunk and cancelled terminal fact.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:14` covers duplicate chunks, terminal completion authority, late chunks, and later failed facts through real `HandleEventAsync` actor handling.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:37` and `:57` cover cancelled/failed terminal authority and later flush or terminal facts being ignored.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:79` covers replay applying a terminal event despite missing intermediate recorder sequence, matching the new terminal gap acceptance branch.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:802` covers duplicate running run admission being ignored without re-dispatching.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1222` covers `HandleLlmSessionForwardedToolCallEmittedAsync` persisting a forwarded call and ignoring the duplicate.
- Diff scan found no added `[Skip]`, manual traits, `Task.Delay`, `Thread.Sleep`, `WaitUntilAsync`, or assertion loosening patterns; `bash tools/ci/test_stability_guards.sh` passed.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 quality review:PR 2289 建议小修后更干净

TL;DR

  • 这是什么: 从可读性、命名、简单性和死代码角度复核 PR 2289。
  • 结论是什么: comment,主链路聚焦且可读,但有 5 处重复 recorder handler 和几处未使用测试 helper。
  • 下一步: 作者可在合并前抽一个小 helper 并删除未使用测试支撑;controller 可按 advisory comment 处理。

质量发现

  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583:593:603:613:623: 这 5 个 recorder handler 都在做同样的“判定能否持久化 run fact,再持久化克隆事件”。LlmSessionGAgent 是负责 response session 的 actor,这里重复会让以后新增 recorder fact 时更容易漏改其中一处;建议抽一个很小的私有 helper。
  • test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18: BuildRunRequest() 在 PR head 内没有调用方。测试支撑里的未用工厂会暗示存在一个 canonical request 形状,但这次验收没有使用它,建议删掉等真正需要时再加。
  • test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:119: RecordingLlmRunSink.ForwardedToolCalls 这次没有断言读取。RecordingLlmRunSink 是测试用的 run 记录 sink,保留未观察的集合会增加测试支撑表面积,建议删除。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:119: CreateActor(..., string? responseId = null) 的可选 responseId 分支目前没有调用方。这个分支只是在测试 helper 中保留未来可能用到的灵活性,建议先删除以保持测试意图更直接。
📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: quality
head_sha: 941790febd654684357c03802b35d4ee6ff81c24
verdict: comment
---

## Verdict
Comment: the PR is focused and readable overall, but a small production duplication and unused test-support surface should be cleaned up for simplicity.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583`, `:593`, `:603`, `:613`, `:623` each repeat the same recorder-handler shape: null-check, `CanPersistRunFact(...)`, then `PersistDomainEventAsync(payload.Clone(), CancellationToken.None)` or `Task.CompletedTask`. Five near-identical copies cross the checklist's extraction threshold and make future recorder fact handling easier to drift.
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18` adds `BuildRunRequest()`, but the PR-head search shows no caller. Keeping unused factory helpers in shared test support makes later tests look like they should use a canonical request shape that is not actually exercised here.
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:119` records `ForwardedToolCalls`, but the only new caller of `RecordingLlmRunSink` checks stream chunks and cancellation; no test reads this property. This is minor dead test-support surface.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:119` accepts optional `responseId`, but all calls pass only `eventStore`. The optional branch at lines `122-124` is currently unreachable and can be removed until a test needs it.

## What would change your verdict
Extract a tiny private helper for the five direct recorder-event handlers, and delete the unused `BuildRunRequest()`, unused `ForwardedToolCalls` collection/recording, and unused optional `responseId` parameter/branch in the test helper.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

📊 状态卡片 — fix 派出

维度
阶段 派出 codex(role=fix)
codex log fix-pr2289-round-3.log
total wall-clock timeout 5400s(~90 min)
上下文 r10 review gate: architect=reject, tests=approve, quality=comment; fixing PR #2289 only.
下一步自动会做 1. fix r 完成 marker FIX_DONE:... 2. controller commit + push 3. 派 reviewer r<N+1>
是否需要人介入 ❌ 否(自动推进)

🤖 controller status banner

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 PR 2289 round 3 fix 已处理 architect reject

TL;DR

  • 这是什么: fix-codex 第 3 轮只处理 reviewer reject 证据。
  • 现在到哪一步 / 结论是什么: 已把两份 canon 文档改成与实际 off-turn execution actor/service 架构一致。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可重新派发 Consensus-rnd Phase review-gate。

修复内容

本轮唯一阻塞来自 architect:文档写成 LlmSessionGAgent 在 session actor command turn 内消费 provider stream,但 PR 代码实际是 transient LlmRunExecutionGAgent / ILlmRunExecutor 在 session actor turn 之外连续消费 live stream,再把 typed Record* recorder commands 发回 session actor。

IActorDispatchPort 是 actor 之间投递命令的标准通道;这次文档明确它负责把执行侧 sink 观察到的 chunk/tool/terminal fact 发回 session actor inbox,session actor 仍然是 responseId + runId + sequence、terminal finalizer 和 cancellation fact 的唯一权威接受方。

文件 为什么重要
docs/canon/llm-streaming.md:40 修正 Responses LlmSession streaming 边界,避免文档要求 session actor command turn 持有 live HTTP/provider stream。
docs/canon/nyxid-responses-direct.md:122 修正 NyxID direct Responses / Messages / Chat Completions 的 run recording 语义,与当前 LlmRunExecutionGAgent / ILlmRunExecutor 路径一致。

验证结果:

dotnet build aevatar.slnx --nologo: pass, 148 warnings, 0 errors
dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --filter 'LlmRunCore|LlmRunExecutor|LlmSessionGAgent|LlmRunEndToEndAcceptance' --nologo: pass, 81 passed, 0 failed, 0 skipped
bash tools/docs/lint.sh: pass, 60 files checked, 0 errors
bash tools/ci/test_stability_guards.sh: pass
git diff --check: pass
📎 完整 codex 原始输出(存档备查)
# Fix report for PR 2289 round 3

## Applied
- (A) docs/canon/llm-streaming.md:40: corrected the Responses `LlmSession` streaming boundary so the doc matches the PR-head implementation: `LlmSessionGAgent` owns run state, sequence acceptance, and terminal facts; transient `LlmRunExecutionGAgent` / `ILlmRunExecutor` consumes the live provider stream off the session actor turn; `DispatchingLlmRunSink` sends typed recorder commands back through `IActorDispatchPort`. This addresses reviewer:architect evidence #1 and its demand: "Update `docs/canon/llm-streaming.md` and `docs/canon/nyxid-responses-direct.md` to match the actual PR architecture...". Cited clauses: `CLAUDE.md` "变更必须可验证:架构调整需同步文档,且 `build/test` 通过。" and "架构调整 PR 必须同步相关 `docs/`,并在验证结果中列出 build/test/guard。"
- (A) docs/canon/nyxid-responses-direct.md:122: corrected the direct Responses / Messages / Chat Completions run-recording section with the same actual execution boundary, including the actor-owned finalizer, typed `Record*` commands, `IActorDispatchPort`, and the self-continuation rationale scoped to why live HTTP/provider stream enumerators cannot be resumed by session actor self-message small steps. This addresses reviewer:architect evidence #1.
- (A) refactor self-doc: not applicable (HOST_REFACTOR_COMMENT_POLICY=none). No `Refactor (...)`, `Old pattern`, or `New principle` source comments were added.

## Rejected as false positive
- None.

## Advisory comments read
- reviewer:tests approved and listed behavior coverage for cancellation flush, duplicate admission, terminal precedence, replay gap behavior, and test-stability guard.
- reviewer:quality commented on possible cleanup in `LlmSessionGAgent` recorder handlers and unused test-support helpers. Because this was `comment`, not `reject`, it was recorded as advisory context and not treated as a mandatory blocking demand for this fix round.

## Blocked (cannot fix this round)
- None.

## Build status
- build: pass (`bash -lc "dotnet build aevatar.slnx --nologo"`; 148 existing warnings, 0 errors)
- tests: pass (`bash -lc "dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --filter 'LlmRunCore|LlmRunExecutor|LlmSessionGAgent|LlmRunEndToEndAcceptance' --nologo"`; 81 passed, 0 failed, 0 skipped)
- docs lint: pass (`bash tools/docs/lint.sh`; 60 files checked, 0 errors)
- test stability guard: pass (`bash tools/ci/test_stability_guards.sh`)
- diff check: pass (`git diff --check`)
- review-thread completion: not seeded (`.refactor-loop/state/review-thread-completion/pr2289.json` was absent)
- full solution tests: not run; prompt Step 3 instructed to pick changed test projects and not run the full solution test suite because it is too slow.

## Recommendation for next round
- expect unanimous: the only `reject` demand was a documentation/implementation architecture mismatch, and both cited canon sections now describe the actual off-turn execution actor/service plus actor-owned recorder fact path.

⟦AI:AUTO-LOOP⟧
FIX_DONE:2289:round-3:applied-2:rejected-0:blocked-0

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 测试视角 review:approve

TL;DR

  • 这是什么: PR 2289 的 tests reviewer 结果,聚焦测试覆盖与测试质量。
  • 现在到哪一步 / 结论是什么: 结论为 approve,新增 LLM 流式运行录制、终态幂等和取消路径都有行为断言覆盖。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可继续按当前评审真值表推进;maintainer 无需因 tests 视角阻塞。

测试覆盖结论

LlmRunCore.RunAsync 是 LLM 运行核心入口;新增取消测试断言取消事实会写入 sink,并确认传给 sink 的 token 不再处于取消状态,避免取消时把验收事实也吞掉。

HandleEventAsync 是 actor 收到事件的测试入口;新增 acceptance 测试覆盖 completion / cancel / failure 三类终态在 duplicate 和 late record 后仍保持权威,还覆盖 replay 时 terminal event 可跨过缺失的中间 sequence。

TaskCompletionSource 是确定性同步点;新增 mid-stream cancellation 测试用它等待流进入后再取消,并用 WaitAsync(ct) 挂起 provider,没有新增 Task.Delay(...) / WaitUntilAsync(...) 这类测试 pacing。

没有发现新增 [Skip]、manual trait、Assert.True(true)、固定 sleep/polling,tools/ci/test_polling_allowlist.txt 也未变化。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: tests
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: approve
---

## Verdict
Test coverage and test quality are adequate for the changed LLM streaming run recording and cancellation behavior.

## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:425` covers the `LlmRunCore.RunAsync` cancellation change by asserting the cancelled fact reaches the sink with a non-cancelled sink token, plus no completed/failed record.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:445` covers mid-stream cancellation with a `TaskCompletionSource` sync point and `WaitAsync(ct)`, not fixed sleep/poll pacing; it asserts the partial chunk and cancelled run fact are recorded.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:15`, `:38`, and `:58` exercise the new actor recorder event handlers through `HandleEventAsync` and assert terminal completion/cancel/failure remain authoritative across duplicate and late records.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:80` covers replay of a terminal event with a missing intermediate sequence, matching the new `allowTerminalGap` branch.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:803` covers duplicate active run admission, and `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1223` covers forwarded tool-call emission idempotency.
- Diff scan of the touched tests found no added `[Skip]`, manual traits, `Assert.True(true)`, `Task.Delay(...)`, or `WaitUntilAsync(...)`; the polling allowlist was not changed.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Quality review: comment

TL;DR

  • 这是什么: PR 2289 的 code quality 角度独立审查,关注可读性、命名、复杂度、死代码与重复。
  • 现在到哪一步 / 结论是什么: 结论是 comment,整体聚焦可读,但新增 typed recorder handlers 有 5 处近似重复。
  • 需要 maintainer 做什么 OR controller 下一步: 建议作者抽一个小的共享持久化 helper;controller 可把该意见按 advisory 处理。

质量审查结果

我只发现一个可读性问题:src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583 开始的 HandleLlmStreamChunkObservedAsyncHandleLlmToolCallObservedAsyncHandleLlmRunCompletedAsyncHandleLlmRunFailedAsyncHandleLlmRunCancelledAsync 五个 handler 都是同一套流程:空值检查、调用 CanPersistRunFact(...) 判断、clone 后持久化、否则 no-op。这里的 CanPersistRunFact(...) 是 actor 接受 LLM run 事实前的统一判定逻辑;重复分散在 5 个入口里,后续改 run fact 接受规则时容易漏改某一个分支。

建议保留这 5 个 public event handler 的类型入口,只抽出一个小的共享 helper 来承载公共的 recorder-fact persistence shape。这个问题不构成拒绝合并的 blocker,所以 verdict 是 comment

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: quality
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: comment
---

## Verdict
Comment: the change is focused and readable overall, but the new typed recorder handlers add avoidable near-identical duplication.

## Evidence
- src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583 - `HandleLlmStreamChunkObservedAsync`, `HandleLlmToolCallObservedAsync`, `HandleLlmRunCompletedAsync`, `HandleLlmRunFailedAsync`, and `HandleLlmRunCancelledAsync` repeat the same null-check, `CanPersistRunFact(...)`, clone-and-persist, no-op fallback shape across five handlers. This crosses the review checklist's `>=3 near-identical inline copies` threshold and makes future run-fact acceptance behavior easy to update in one handler but miss in another.

## What would change your verdict (only if comment or reject)
Extract one small shared helper for the common recorder-fact persistence path, with the five public event handlers keeping only the type-specific identity/terminal arguments.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 架构复审结论:approve

TL;DR

  • 这是什么: PR 完善 LLM 流式运行的 actor 录制与验收覆盖 #2289 的 architect 视角复审,聚焦 CLAUDE.md / AGENTS.md 架构合规。
  • 现在到哪一步 / 结论是什么: 当前 head 978dc11 未发现新增架构违规,verdict 为 approve。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可按共识表继续合并判断;无需作者补架构改动。

架构证据

本轮重点复核了 LLM 流式运行录制路径:LlmSessionGAgent 是 response session actor,即持有一次 response/run 权威状态的 actor;IActorDispatchPort 是 actor 之间发命令的标准通道。

关键证据如下:

  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:401 先提交 LlmRunStartedEvent,随后在 :411 提交 LlmRunExecutionReadyEvent,再于 :417 调度 transient execution。这个边界让 session actor 继续持有 run 权威状态,但不把 live provider stream 塞进 actor command turn。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583:593:603:613:623 处理 typed recorder facts,并由 actor 决定是否持久化,符合 “committed domain event 必须可观察” 的 CLAUDE.md 约束。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1017:1065 使用 responseId + runId + sequence + recordId 做幂等接受,没有新增中间层进程内 run registry。
  • src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37 使用 CancellationToken.None 记录取消终态,避免调用方取消 token 抑制 LlmRunCancelled typed fact。
  • docs/canon/llm-streaming.md:40:49 已与当前实现一致:live provider stream 由 transient execution actor/service 消费,provider fact 通过 IActorDispatchPort 回到 session actor,且明确拒绝 Task.Run / self-continuation ownership。

本轮也检查了 patch additions:没有新增 .proto 变更、Old/New refactor-history source comments、.refactor-loop host SSOT 引用、外部 repo 引用、*WriteActor / *ReadActor / *StoreActor split、ChatAsyncTask.RunTypeUrl.Contains5000 / 5050

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: architect
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: approve
---

## Verdict
approve - no architecture compliance blocker found for the LLM stream recorder/finalizer changes.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:401` records `LlmRunStartedEvent`, then `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:411` records `LlmRunExecutionReadyEvent` before dispatching transient execution at `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:417`. This keeps the session actor as the run authority while avoiding live provider stream consumption inside the actor turn, matching `CLAUDE.md` "Actor 即业务实体:一个 actor = 一个业务实体(数据与方法同住);禁止按技术功能(读/写/投影)拆分同一业务实体为多个 actor。" and "业务推进内聚:工作流推进(成功/失败/分支/重试)在 Actor 事件处理流程内完成,保证顺序性与可重放性。"
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583`, `:593`, `:603`, `:613`, and `:623` add typed event handlers for `LlmStreamChunkObserved`, `LlmToolCallObserved`, `LlmRunCompleted`, `LlmRunFailed`, and `LlmRunCancelled`, and persist accepted facts through the actor. This matches `CLAUDE.md` "committed domain event 必须可观察:write-side 完成 committed event 后必须送入 projection 主链;禁止只落 event store 而不进入可观察流。"
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1017` and `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:1065` gate run facts by terminal status, `responseId`, `runId`, monotonic sequence, and record id rather than a process-local registry. This matches `CLAUDE.md` "事实源唯一:跨请求/跨节点一致性事实必须有唯一权威来源(Actor 持久态或分布式状态),不依赖进程内偶然状态。" and AGENTS.md "禁止在中间层维护 `entity/actor/workflow-run/session` 等 ID 到上下文或事实状态的进程内映射(`Dictionary<>``ConcurrentDictionary<>``HashSet<>``Queue<>`)。"
- `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37` records cancellation with `CancellationToken.None`, so caller cancellation cannot suppress the terminal `LlmRunCancelled` fact. This is consistent with AGENTS.md "AI 对话主链必须流式化:仓库内新增或修改 AI 对话执行路径时,禁止使用 `ChatAsync` 作为正式主链;必须使用 `ChatStreamAsync` 作为唯一权威执行入口,使文本、reasoning、tool call、tool result、完成态都沿同一条流式链路对外发布。"
- `docs/canon/llm-streaming.md:40` through `docs/canon/llm-streaming.md:49` now documents the same boundary implemented by the code: transient execution consumes the live provider stream outside the session actor turn, typed recorder facts return through `IActorDispatchPort`, and `Task.Run` / self-continuation stream ownership are rejected. This satisfies `CLAUDE.md` "架构调整 PR 必须同步相关 `docs/`,并在验证结果中列出 build/test/guard。"
- Scope and policy checks from the actual diff: no `.proto` files changed; no new `Old pattern` / `New principle` source comments were added while `HOST_REFACTOR_COMMENT_POLICY` is empty/default `none`; no `.refactor-loop`/host production SSOT references, external repo references, new `*WriteActor` / `*ReadActor` / `*StoreActor` split, process-local middle-layer run registry, `ChatAsync`, `Task.Run`, `TypeUrl.Contains`, or `5000`/`5050` additions appeared in the patch.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 架构审查通过

TL;DR


架构审查要点

本轮只看架构合规,不评测试覆盖质量或性能。IActorDispatchPort 是 actor 之间投递命令/事实的标准通道;本 PR 的 recorder facts 仍通过 actor inbox 进入 LlmSessionGAgent,没有把执行侧变成事实源。

关键判断:

检查项 结论 说明
actor 权威状态 通过 LlmSessionGAgent 仍持有 run sequence / terminal finalizer,执行侧只连续消费 provider stream 并回写 typed facts。
读写分离 通过 没有引入 query-time replay、host 拼 terminal response 或侧读 actor state。
中间层状态 通过 未新增 actor/session id 到上下文的进程内字典或 registry。
Old/New 注释策略 通过 HOST_REFACTOR_COMMENT_POLICY 为空,归一为 none;新增 diff 未发现 refactor-history source comments。
外部仓库边界 通过 涉及 NyxID 文档语义,但未要求或依赖 NyxID 仓库变更。

涉及的核心位置:

  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:401 继续在 session actor 内持久化 run started / execution ready / terminal facts,这对应“actor 是唯一权威事实源”。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:582 新增 typed recorder fact handlers,并用 sequence/terminal gate 控制接受窗口,避免重复或晚到事件覆盖 terminal 状态。
  • docs/canon/llm-streaming.md:38docs/canon/nyxid-responses-direct.md:120 记录 live provider stream 不能被 session actor self-continuation 恢复的原因,避免后续反复提议不可持久化的 HTTP stream 枚举器方案。
📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: architect
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: approve
---

## Verdict
approve - no architecture compliance regression found in the changed LLM streaming recorder/finalizer path.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:401` persists `LlmRunStartedEvent`, `LlmRunExecutionReadyEvent`, and later typed run facts through the session actor instead of query-time reconstruction or host-side response assembly; this is aligned with CLAUDE.md: "单一权威拥有者:每个稳定业务事实有唯一 actor 拥有;`committed event store + actor state` 是唯一真相,readmodel 只是查询副本。"
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:582` adds handlers that accept `LlmStreamChunkObserved`, `LlmToolCallObserved`, `LlmRunCompleted`, `LlmRunFailed`, and `LlmRunCancelled` as typed actor facts, guarded by `CanPersistRunFact`; this keeps the model in the `Command -> Event` lane and matches CLAUDE.md: "核心语义强类型:影响业务语义、控制流、稳定读取且仓库内可控的数据,必须建模为 `proto field / typed option / typed sub-message`,禁止塞入通用 bag。"
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:731` and `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:833` keep duplicate/late recorder facts idempotent using sequence/record identity inside actor state, with no new middle-layer `actorId -> context` dictionary or read/write actor split; this preserves AGENTS.md: "投影编排 Actor 化:Projection 的会话、订阅、关联关系等运行态必须由 Actor 或分布式状态承载;禁止在中间层通过进程内注册表/字典持有事实状态。"
- `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37` records cancellation with `CancellationToken.None`, so the final cancellation fact is not suppressed by the caller cancellation token; this supports CLAUDE.md: "ACK 诚实:同步返回只承诺已达到阶段(默认 `accepted + stable command id`);`committed`/`read-model observed` 等强保证须通过独立契约或异步观察获取。"
- `docs/canon/llm-streaming.md:38` and `docs/canon/nyxid-responses-direct.md:120` document the transient execution actor/service boundary and explicitly state the session actor remains the durable sequence/watermark/finalizer authority; this satisfies CLAUDE.md: "变更必须可验证:架构调整需同步文档,且 `build/test` 通过。"
- Comment-policy check: `HOST_REFACTOR_COMMENT_POLICY` is empty, so it normalizes to `none`; the actual added source/doc diff contains no new Old/New or iteration refactor-history comments.
- Scope/boundary check: the diff stays inside issue #2276's LLM run acceptance and stream-not-resumable rationale scope; no `.proto` files, no external repository references requiring changes, no `.refactor-loop/host.env` or production SSOT movement, and no new `*WriteActor` / `*ReadActor` / `*Store` split were found.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 tests review approve:LLM 流式 run 录制测试覆盖足够

TL;DR

  • 这是什么: PR 2289 的 tests 角度独立 review,重点看覆盖质量、稳定性规则和断言质量。
  • 现在到哪一步 / 结论是什么: 结论是 approve;新增测试覆盖取消 flush、terminal authority、重复/晚到 recorder fact、replay gap 和 forwarded tool-call 去重。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可以继续汇总其他 reviewer 结果;maintainer 无需为 tests 角度补 must-fix。

测试覆盖结论

我检查了 origin/feature/integrate...origin/refactor/iter2276-issue-2276 的三点 diff,只按测试质量判断。新增测试不是单纯增加行数:每个关键用例都断言了业务状态,例如 terminal fact 是否权威、晚到 chunk 是否不覆盖、取消时是否仍写入 LlmRunCancelled typed fact。

关键证据:

  • test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:424 覆盖 LlmRunCore 在请求取消后仍要用非取消 token 写入 cancelled fact;这直接对应本 PR 把 sink 取消记录改为 CancellationToken.None 的生产变更。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:443TaskCompletionSource 协调 never-terminal stream,断言 partial chunk 保留且 cancelled fact 写入;这里没有 sleep/delay pacing。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:15 覆盖 duplicate chunk、late recorder command、terminal retry 和 terminal 后 failure 不覆盖 completed 状态。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:38:58 分别覆盖 cancelled / failed terminal authority,防止 terminal 后再被 late fact 改写。
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:80 覆盖 replay 时 terminal event 可跨过缺失的中间 recorder sequence。
  • test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:803 覆盖 duplicate running run admission no-op。
  • test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1223 覆盖 LlmSessionForwardedToolCallEmittedEvent 录制和重复 call id 去重。

稳定性检查结果:没有新增 Task.DelayThread.SleepWaitUntilAsync[Skip]、manual category、Assert.True(true) 或明显弱化断言;tools/ci/test_polling_allowlist.txt 没有修改。

本地验证通过:dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --no-restore --nologo 结果为 896 passed / 0 failed / 0 skipped。

📎 完整 codex 原始输出(存档备查)

pr: 2289
role: tests
head_sha: 978dc11
verdict: approve

Verdict

Approve: the PR adds behavior-level coverage for the new LLM run recorder and cancellation semantics without adding unstable pacing, skipped tests, or weakened assertions.

Evidence

  • test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:424 verifies cancelled provider execution still records a LlmRunCancelled fact through the sink with a non-cancelled sink token, covering the LlmRunCore change that flushes cancellation using CancellationToken.None.
  • test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:443 covers a never-terminal stream using TaskCompletionSource plus cancellation, asserting the partial chunk is preserved and the terminal cancelled fact is recorded; this is deterministic async coordination, not sleep/delay pacing.
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:15 covers duplicate chunk suppression, terminal completion authority, late recorder-command rejection, and terminal retry idempotency.
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:38 and :58 cover cancelled and failed terminal authority respectively, including late chunk/completed/cancelled attempts after terminal state.
  • test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:80 covers replay with a terminal event that intentionally skips intermediate recorder sequence numbers, matching the new terminal-gap behavior.
  • test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:803 covers the new duplicate running-run admission no-op branch.
  • test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1223 covers the new HandleLlmSessionForwardedToolCallEmittedAsync path and duplicate forwarded tool-call idempotency.
  • Diff scan found no added Task.Delay, Thread.Sleep, WaitUntilAsync, [Skip], manual test category, Assert.True(true), or assertion loosening pattern in touched tests. tools/ci/test_polling_allowlist.txt was not modified.
  • Verification passed locally: dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --no-restore --nologo passed 896 tests, 0 failed, 0 skipped.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Quality review: comment

TL;DR

  • 这是什么: PR 2289 的 readability / simplicity 角度独立 review。
  • 结论是什么: comment,没有发现阻塞合并的问题,但有 2 个可读性/DRY 建议。
  • 下一步: controller 可继续按总门禁判断;作者可顺手抽一个小 helper 并替换测试里的裸数字断言。

质量发现

src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583 起新增的 5 个 recorder event handler 结构几乎一致:先用 CanPersistRunFact 判断,再 clone event 并调用 PersistDomainEventAsyncPersistDomainEventAsync 是 actor 持久化领域事件的入口,这段重复会让后续改 recorder 接受/terminal finalizer 规则时容易漏改某一个 handler。建议抽成一个私有 helper,例如 PersistAcceptedRunFactAsync(...),保留每个 handler 的 typed 参数入口即可。

test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:30:51:71 新测试用 234 断言 ActiveRun.Status。这些数字分别对应 completed / failed / cancelled run 状态,但读测试的人需要跳回 production 常量才能理解。建议改成测试本地命名常量或 helper,让断言表达业务状态名。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: quality
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: comment
---

## Verdict
Comment: the PR is focused and readable overall, but the new recorder-event path adds avoidable duplication and a few raw numeric status assertions that obscure intent.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583`: `HandleLlmStreamChunkObservedAsync`, `HandleLlmToolCallObservedAsync`, `HandleLlmRunCompletedAsync`, `HandleLlmRunFailedAsync`, and `HandleLlmRunCancelledAsync` repeat the same `CanPersistRunFact(...) ? PersistDomainEventAsync(Clone, None) : CompletedTask` shape five times. This crosses the checklist's ">=3 near-identical inline copies" threshold and makes future changes to recorder acceptance/finalizer handling easy to miss in one handler.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:30`: the new acceptance tests assert `ActiveRun.Status` with raw literals (`2`, `3`, `4`) at lines 30, 51, and 71. The production code has named meanings for those values, so the tests should express `Completed/Failed/Cancelled` intent instead of requiring readers to remember the numeric encoding.

## What would change your verdict
Extract the repeated recorder accept/persist body into a small private helper such as `PersistAcceptedRunFactAsync(IMessage fact, string responseId, string runId, long sequence, bool terminal)`, and replace the raw status literals in the new acceptance tests with named local constants or an exposed test helper that names the run status values.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 tests review: approve

TL;DR

  • 这是什么: PR 2289 的测试质量复核,聚焦 LLM 流式运行的 actor 录制与验收覆盖。
  • 结论: approve;新增测试覆盖了取消 flush、直接 recorder facts、终态权威性、重复抑制和 replay gap,未发现跳过测试或 sleep/polling pacing。
  • 下一步: controller 可把 tests 角色视为通过;无需 maintainer 补测试。

测试覆盖结论

LlmRunCore 是 LLM 流式执行核心;本 PR 对取消路径的关键变化有直接测试。test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:425 断言取消时会记录 LlmRunCancelled,并且传给 sink 的 token 不再是已取消 token。test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:445 覆盖 provider 永不发终态 chunk 的场景,断言 partial chunk 被保留、取消被记录,并且没有 completion/failure。

LlmSessionGAgent 是拥有 response session 和 run state 的 actor;新增验收测试覆盖了直接 recorder facts 进入 actor 的行为。test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:15 覆盖 completion 终态对 duplicate/late chunks/late failure 的权威性,:38 覆盖 cancelled 终态,:58 覆盖 failed 终态,:80 覆盖 replay 时 terminal event 可以跨过中间缺失 sequence,同时 late lower-sequence chunk 不会回写状态。

重复 run admission 和 forwarded tool call 路径也有行为断言。test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:803 断言同一个 running run 再次 admission 不推进版本;:1173 覆盖流式 run 中 forwarded tool call 被记录到状态和 completion;:1205 覆盖直接 forwarded-tool-call event 的 duplicate suppression。

稳定性方面,PR diff 没有新增 [Skip][Trait("Category", "Manual")]Assert.True(true)Task.DelayThread.SleepWaitUntilAsync 或 polling allowlist 条目。定向验证已跑过:

dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --no-restore --filter "FullyQualifiedName~LlmRunCoreTests|FullyQualifiedName~LlmRunEndToEndAcceptanceTests|FullyQualifiedName~LlmSessionGAgentTests" --nologo
Passed: 47, Failed: 0, Skipped: 0
📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: tests
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: approve
---

## Verdict
Approve: the PR adds behavior-level coverage for the new LLM streaming recorder paths, cancellation flush behavior, duplicate suppression, and terminal replay semantics without adding skips, sleep-based pacing, or weakened assertions.

## Evidence
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:425` verifies the cancellation path records `LlmRunCancelled` through the sink with a non-cancelled sink token, covering the `CancellationToken.None` change in `LlmRunCore.RunAsync`.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunCoreTests.cs:445` covers a never-terminal provider stream by asserting the partial chunk is preserved, cancellation is recorded, and no completion/failure is emitted; it uses `TaskCompletionSource` and cancellation rather than `Task.Delay` polling.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:15` asserts completion remains authoritative across duplicate chunks, late chunks, duplicate completion, and later failure facts.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:38` and `:58` cover cancelled and failed terminal facts staying authoritative and ignoring later chunks/terminal facts.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:80` covers replay applying a terminal event despite a missing intermediate sequence while rejecting the late lower-sequence chunk.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:803` covers duplicate running run admission as an idempotent no-op.
- `test/Aevatar.GAgentService.Tests/Core/LlmSessionGAgentTests.cs:1173` and `:1205` cover forwarded tool call recording through the streaming run and direct forwarded-tool-call event duplicate suppression.
- Stability checks found no added `[Skip]`, manual traits, `Assert.True(true)`, `Task.Delay`, `Thread.Sleep`, `WaitUntilAsync`, or polling allowlist changes in the PR diff.
- Targeted verification passed: `dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --no-restore --filter "FullyQualifiedName~LlmRunCoreTests|FullyQualifiedName~LlmRunEndToEndAcceptanceTests|FullyQualifiedName~LlmSessionGAgentTests" --nologo` reported 47 passed, 0 failed, 0 skipped.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:tests:approve

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 质量审查结论:comment

TL;DR

  • 这是什么: PR 2289 的 readability / naming / simplicity / dead-code 角度独立审查。
  • 现在到哪一步 / 结论是什么: 代码整体聚焦可读,但有 1 处重复 wrapper 模式和 2 处未使用测试辅助面,结论为 comment
  • 需要 maintainer 做什么 OR controller 下一步: 建议作者顺手收敛这些小清理;没有发现必须阻塞合并的质量问题。

质量证据

LlmSessionGAgent 是负责 response session 权威 run 状态的 actor,actor 在这里指串行拥有并提交状态事实的运行单元。src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583 开始新增 5 个 recorder event handler;recorder event handler 是把执行侧观察到的 chunk/tool/terminal 事实录回 session actor 的入口。这 5 个方法都在做同一件事:先用 CanPersistRunFact(...) 判断是否接受,再 PersistDomainEventAsync(observed.Clone(), CancellationToken.None),只有 terminal: true/false 不同。这里不是行为错误,但 5 份同形代码会让后续审查更难确认 terminal 与非 terminal 分支有没有漂移,建议抽一个小 helper 表达“接受则持久化 cloned run fact”。

test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18 新增 BuildRunRequest(),但本次 touched tests 没有调用它。这个 helper 看起来像验收 harness 的正式工厂入口,实际没有覆盖用途,建议删除或接入现有验收用例,避免测试支撑面显得比实际需求更宽。

test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:119CreateActor(InMemoryEventStore eventStore, string? responseId = null) 新增了可选 responseId,但所有调用方都走默认值。这个参数目前没有表达业务变体,建议先删掉,等确实有 alternate actor identity 用例时再加。

HOST_REFACTOR_COMMENT_POLICY 在本次环境中为空,按规则归一为 none;因此缺失 self-doc 不是 reject 理由。LlmSessionGAgent.cs 中现有 Refactor / Old pattern / New principle 注释在 base 分支已存在,不是本 PR 新增缺陷。

📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: quality
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: comment
---

## Verdict
Comment: the PR is focused and readable overall, with no new refactor self-doc defect under `HOST_REFACTOR_COMMENT_POLICY=none`, but it adds a little avoidable duplication and unused test support surface.

## Evidence
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:583` adds five near-identical recorder handlers (`HandleLlmStreamChunkObservedAsync`, `HandleLlmToolCallObservedAsync`, `HandleLlmRunCompletedAsync`, `HandleLlmRunFailedAsync`, `HandleLlmRunCancelledAsync`) that all compute `accepted = CanPersistRunFact(...)` and conditionally persist a cloned event. That is five inline copies of the same admission/persist shape; a small helper would make the terminal/non-terminal distinction easier to review.
- `test/Aevatar.GAgentService.Tests/TestSupport/LlmRunAcceptanceHarness.cs:18` introduces `BuildRunRequest()`, but no touched test calls it. Keeping unused test factory surface makes the acceptance harness look broader than it is.
- `test/Aevatar.GAgentService.Tests/Application/LlmRunEndToEndAcceptanceTests.cs:119` adds an optional `responseId` parameter to `CreateActor`, but every caller uses the default. The parameter can be deleted until a test actually needs alternate actor identity.

## What would change your verdict (only if comment or reject)
Extract the repeated recorder-event admission/persist wrapper in `LlmSessionGAgent`, and delete or wire the unused test helper/optional parameter.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:quality:comment

⟦AI:AUTO-LOOP⟧

@eanzhao

eanzhao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 架构审查 r13:approve

TL;DR

  • 这是什么: PR 2289 的 architect 视角第 13 轮独立审查。
  • 现在到哪一步 / 结论是什么: 当前 head 978dc11 未发现 CLAUDE.md / AGENTS.md 架构合规回退,结论为 approve。
  • 需要 maintainer 做什么 OR controller 下一步: controller 可按当前 review 真值表继续汇总结论;maintainer 无需补架构项。

架构合规结论

LlmSessionGAgent 是 response session 的权威 actor,也就是拥有 run state、sequence 和 terminal finalizer 的状态单元。IActorDispatchPort 是 actor 之间投递命令/事实的标准通道。本轮检查确认:执行侧连续消费 live provider stream,但状态事实仍回到 LlmSessionGAgent 持久化,没有把执行器或 Host 变成事实源。

关键证据:

  • docs/canon/llm-streaming.md:40docs/canon/nyxid-responses-direct.md:122 明确记录 transient execution actor/service 负责消费 live provider stream,LlmSessionGAgent 仍是 actor-owned run 记录权威;这对应 CLAUDE.md 的流式主链和单一权威事实源要求。
  • src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37CancellationToken.None 写入 LlmRunCancelled,避免调用方取消 token 把最终取消事实吞掉。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:401:411:421 仍在同一个 session actor state machine 内处理 run start、execution-ready 和 execution dispatch,没有新增 *WriteActor / *ReadActor / *StoreActor 拆分。
  • src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:586:626 只通过 actor handler 接受 typed run facts,:1065 用 actor state 校验 sequence / terminal 接受窗口;没有新增中间层进程内 run registry。
  • 本轮 scoped diff 没有 .proto 变更、没有新增 Old/New refactor-history source comment、没有外部仓库变更依赖、没有 .refactor-loop host SSOT 移入。
📎 完整 codex 原始输出(存档备查)
---
pr: 2289
role: architect
head_sha: 978dc11b915414e5b63c4b33ca2a16a2778c6f61
verdict: approve
---

## Verdict
approve — no architecture compliance regression found in the scoped LLM streaming actor-recording diff.

## Evidence
- `docs/canon/llm-streaming.md:40` and `docs/canon/nyxid-responses-direct.md:122` document the run boundary as transient execution actor/service consuming the live provider stream while `LlmSessionGAgent` remains the actor-owned fact source; this aligns with CLAUDE.md clauses "AI 对话主链必须流式化:实时会话入口必须使用 `ChatStreamAsync`" and "单一权威拥有者:每个稳定业务事实有唯一 actor 拥有".
- `src/platform/Aevatar.GAgentService.Application/Responses/LlmRunCore.cs:37` flushes cancellation through `RecordRunCancelledAsync` with `CancellationToken.None`, preserving a typed terminal fact instead of letting caller cancellation suppress actor recording; this supports the CLAUDE.md clause "已提交领域事件必须可观察".
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:401`, `:411`, and `:421` keep run start, execution-ready, and dispatch scheduling inside the existing `LlmSessionGAgent` state machine; no new `*WriteActor` / `*ReadActor` / `*Store` split is introduced.
- `src/platform/Aevatar.GAgentService.Core/GAgents/LlmSessionGAgent.cs:586` through `:626` accepts typed run facts only through actor handlers, and `:1065` gates sequence/terminal acceptance against actor state; this follows the CLAUDE.md clauses "Actor/模块运行态只能在事件处理主线程修改" and "核心语义强类型".
- No `.proto` files changed in the PR diff, no new source `Old pattern` / `New principle` refactor-history comments were added under the default `HOST_REFACTOR_COMMENT_POLICY=none`, no external repository references were added, and the diff stays within the issue-2276 authorized source/docs/test scope.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:2289:architect:approve

⟦AI:AUTO-LOOP⟧

@eanzhao eanzhao marked this pull request as ready for review June 20, 2026 01:07
@eanzhao eanzhao merged commit d801253 into feature/integrate Jun 20, 2026
11 checks passed
@eanzhao eanzhao deleted the refactor/iter2276-issue-2276 branch June 20, 2026 01:07
@eanzhao eanzhao added crnd:phase:merged Work has landed. and removed crnd:phase:reviewing Review workers are active. labels Jun 20, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 978dc11b91

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1071 to +1075
if (State.ActiveRun == null ||
!string.Equals(State.ActiveRun.RunId, runId, StringComparison.Ordinal))
{
return true;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject recorder facts for non-active runs

When a typed recorder fact arrives with a run_id different from the current ActiveRun, this branch accepts it; the applier then calls EnsureRun, which replaces ActiveRun and can append chunks or terminalize the response for the wrong run. In a stale/misrouted execution dispatch scenario, this bypasses the stricter Record* path that rejects mismatched run ids and lets a foreign run overwrite an in-flight session instead of being ignored.

Useful? React with 👍 / 👎.

Comment on lines +652 to +656
await PersistDomainEventAsync(new LlmSessionForwardedToolCallEmittedEvent
{
ResponseId = existing.ResponseId,
Call = call,
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve run sequencing on forwarded tool events

If an execution recorder dispatches LlmSessionForwardedToolCallEmittedEvent with run_id, sequence, and record_id, this handler drops those fields before persisting. The applier therefore treats it as a session-level tool call, never advances the run sequence or records the tool on ActiveRun, so the next non-terminal fact with the following sequence is rejected as out of order.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crnd:human:auto Controller may continue without maintainer intervention. crnd:lifecycle:managed Item is managed by codex-refactor-loop. crnd:phase:merged Work has landed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant