[llm] playground for LLM-related interaction by strub · Pull Request #1019 · EasyCrypt/easycrypt

strub · 2026-05-27T03:15:08Z

Playground branch for experimenting with LLM-driven EasyCrypt interaction.

Not intended for review or merge in its current form — kept as a draft so
the work-in-progress is visible.

When `pragma +strict_bullets` is set, the bullet tokens `-`, `+`, `*` (and their repeated forms `--`, `++`, `**`, ...) become Coq-style focusing operators: each phrase's bullet is checked against a per-proof stack so that a subgoal must be discharged before its sibling is addressed, and so that bullet characters identify nesting levels. The default behavior is unchanged: without the pragma, bullets remain pure decoration, preserving every existing proof script. Repeated bullet characters are emitted by the lexer as single PLUSn/MINUSn/STARn tokens (carrying their literal), which makes them usable both as deeper bullet levels and as user-defined binary operators (`op (--) ...`). The final sibling of a split point can be continued at the parent level without a bullet: when the previous sibling is closed and exactly one sibling remains, the inner frame pops automatically (and cascades through nested last-sibling frames). This keeps the standard phl idiom of long `seq N : <post>' chains flat instead of forcing ever-deeper indentation under a `+' for each continuation. Multi-way splits keep their enumeration discipline for all but the last subgoal.

Introduce an interactive REPL for LLM coding agents driving EasyCrypt (`easycrypt llm`) using a line-oriented protocol over stdin/stdout, plus two CLI flags for goal inspection: - `-upto <pos>` compile up to a given position and print the goals - `-lastgoals` print the last unproven goals at end-of-file REPL protocol (see `doc/llm/CLAUDE.md` for the full guide): - LOAD "file.ec" [LINE[:COL]] -- compile, optionally up to a position - UNDO / REVERT <uuid|name> -- navigate proof state - GOALS / GOALS ALL -- inspect current or all subgoals - CHECKPOINT <name> -- named bookmarks for branching - SEARCH <pattern> -- lemma search - QUIET ON/OFF -- suppress goal display for bulk input - Direct EasyCrypt input (tactics, declarations, search, print, ...) Responses use a typed envelope (OK/ERROR with uuid) terminated by an `<END>` sentinel for reliable parsing. LOAD reports the last processed line in the response tag; error messages include the offending source text; only the current subgoal is shown by default with a remaining count.

Add a -trace flag to the LOAD REPL command. When set, LOAD compiles the prefix exactly as today (using the existing LINE[:COL] argument, or up to EOF if omitted), but defers the last sentence and runs it under goal capture, then returns a response body with four delimited blocks: === BEFORE: line L (col C) === <focused goal before the sentence> === TACTIC (lines L:C - L':C') === <exact source text of the sentence> === AFTER: line L (col C) === <new focused goal + any new sibling goals> === SUMMARY === open goals: N1 -> N2 Adapted from PR #1018 (-trace LINE[:COL] for batch mode): same delimiters and the same new-or-modified-head filtering for AFTER. The position is taken from LOAD's existing LINE[:COL] argument; the tag is the regular [loaded:file:LINE]. If the deferred sentence is outside a proof context, or there is no sentence to trace, the reply uses the ERROR envelope with a clear message. If the sentence fails, the BEFORE/TACTIC blocks are still delivered, AFTER carries a <sentence failed> marker, and the formatted exception is appended. Expose EcCommands.in_proof so the REPL can check the pre-execution proof status without inspecting scope internals.

Three small additions to make REPL-driven proof exploration pleasant without weakening +strict_bullets for saved scripts. 1. Bullet relaxation for REPL input. EcCommands.disable_repl_bullets is called at every REPL phrase; it drops pm_strict_bullets and clears puc_bullets on the active proof so REPL-typed tactics are not rejected for missing bullets. Files loaded via LOAD still respect their own pragma; only direct REPL input is relaxed. 2. TREE / TREE ALL meta-commands. List all open subgoals as a flat numbered enumeration with the focused goal marked. TREE shows a one-line conclusion per goal; TREE ALL shows the full goal bodies. Backed by EcCommands.pp_tree on top of EcCoreGoal.all_opened. 3. [focus: k/N] reply tag. When more than one subgoal is open, both tactic replies and the LOAD response carry [focus: 1/N] alongside any other tag, so the caller knows the next tactic targets goal #1 of N. Supporting plumbing: EcScope.set_xgoal exposes a way to swap the active proof_uc without going through the tactic engine.

The REPL relies on EC's "first open goal is the focused one" convention. Until now the only way to work on a non-first goal was to discharge the earlier ones; the proving agent often wants to inspect or skip a particular sibling without that. Add two meta-commands: FOCUS N rotate the open-goal list so the goal at index N (from TREE) becomes the focused one, preserving cyclic order. NEXT shorthand for FOCUS 2 (rotate one step). Backed by a new EcCoreGoal.rotate_focus that splits and recombines pr_opened. Going through the tactic engine doesn't work for standalone rotation: tcenv1_of_proof tc_down's the siblings out of view, so Protate (the `first last` parsed form) has nothing to rotate at the top level. EcCommands.focus_goal wraps rotate_focus, applies it via the same scope-mutation path disable_repl_bullets uses, and pushes a new context so UNDO/REVERT can roll the change back.

REPL phrases are recorded with (uuid, source, parent_handle, opens), where parent_handle is the focused goal right before the phrase ran. COMMIT replays them against the proof DAG to recover the bullet structure. The DAG edge is now explicit. EcCoreGoal's proofenv carries a pr_parent : handle ID.Map.t populated by FApi.newgoal (the single choke-point where every child handle is created); EcCoreGoal exposes children_of_handle / parent_of_handle on top of it. This avoids the older approach of reading children out of g_validation, which only worked for VIntros / VConv / VLConv / VRewrite / VExtern -- not for VApply, whose subgoals are added to the tcenv state outside the validation record. Algorithm: - For each phrase, walk the subtree rooted at its parent handle, registering each multi-child split's children in [sibling_depth] at the right depth. Single-child links are continuations and do not bump depth. - To decide whether a phrase needs a bullet, walk upward via parent_of from its recorded parent until hitting a registered sibling ancestor; if found, emit the bullet for that depth and consume the registration. Bullet tokens are chosen per depth from PR 1017's lexer order (-, +, *, --, ++, **, ---, +++, *** ...), skipping any token already in scope from the LOAD prefix's puc_bullets stack. The stack is snapshotted at the moment REPL input takes over (the new return value of disable_repl_bullets) so COMMIT can avoid token collisions with frames opened by the prefix. Tested patterns: simple split, nested split, case-split, multi-tactic-per-sibling, compound first phrase (move=> hp hq; split.), pHL seq N chain, list induction, have introducing a side goal, [split; split.] producing 4 goals in one phrase, UNDO/REVERT trimming, LOAD mid-proof continuation, and LOAD prefix already using bullet tokens that COMMIT must avoid. All round-trip through `ec.exe compile` under `pragma +strict_bullets`.

The LLM REPL accumulated ~870 lines of closures and refs inside [main]'s body, intermixed with the unrelated compile/runtest/docgen plumbing. Move it out into [src/ecLlm.ml] (with a one-line .mli exposing just [val run]). The implementation organises its closed-over state (notices buffer, transcript, prior-bullets snapshot, checkpoints, quiet flag, initialized flag) at the top of [run], then groups the helpers into nested submodules so each concern is named: - Goals goal/error formatting, focus tag, tree rendering - Wire OK/ERROR/<END> envelope and replies - Transcript transcript trimming and clearing - Commit bullet-token generator and DAG walk for COMMIT - Load LOAD parser, prefix processor, and -trace block [ec.ml] keeps the small [Llm] dispatch arm that calls [EcLlm.run ~relocdir ~boot llmopts]. Pure move, no behavioural change; smoke-tested with COMMIT, TREE, FOCUS, and -trace.

The main loop was a flat ~150-line if/else chain that mixed line parsing (substrings, int_of_string, String.starts_with checks) with the actions to take. Split into: module Parse: a [command] variant covering every accepted line shape (Quit, Help, Undo, Goals of [`One|`All], Tree of [`One|`All], Commit, Focus of int, Next, Checkpoint of string, Revert of string, Quiet of bool, Search of string, Load of string, Ec of string, Begin_multi, Done_multi, Multi_line of string, Blank), plus [of_line ~multi_active] which is a stateless string -> command, and [Parse_error] for argument-shape mistakes (e.g. "FOCUS foo"). module Dispatch: a flat pattern match on the parsed command, delegating to small handlers (do_help, do_undo, do_focus_request, do_checkpoint, do_revert, do_quiet, do_search) and to the existing Load/Commit/Goals submodules. Holds the multi-line buffer state (multi_buf/in_multi) since Parse is pure. The main loop becomes 9 lines: read a line, parse to a command, dispatch, catch Parse_error and reply ERROR. Behaviour is unchanged; manual smoke tests cover COMMIT, TREE, FOCUS (good and bad arg), CHECKPOINT/REVERT, SEARCH, multi-line input, and the prior-bullets collision case.

Before, lines like "CHECKPOINT" (no trailing argument) silently fell through to EC input because the dispatcher used [String.starts_with "CHECKPOINT "] -- requiring the trailing space that [String.strip] on the input line had just removed. The user saw EC's generic "parse error" instead of a command-specific message. Same shape applied to LOAD, FOCUS, REVERT, and SEARCH. Introduce a small [keyword_arg kw line] helper that accepts both [line = kw] and [line = kw ^ " " ^ ...], returning the stripped argument tail. Each prefix command now routes to its own parser even when the argument is empty, and each parser produces a specific [Parse_error] message: FOCUS -> "FOCUS: missing argument" CHECKPOINT -> "CHECKPOINT: missing name" REVERT -> "REVERT: missing uuid or checkpoint name" SEARCH -> "SEARCH: missing query" LOAD -> "LOAD: missing filename" (Load.handle, which already had this branch but it was unreachable)

Introduce a frame tree derived from pr_opened + parent_of: each open leaf's chain of multi-child ancestors (skipping single-child continuations) becomes its path through the tree. The same data structure backs both TREE rendering and FOCUS path lookup. TREE now shows depth-indented entries labelled with dotted paths matching what FOCUS accepts. Leading singleton frames are unwrapped: when all opens share an outermost split, the rendering starts at that split's branches, not at a redundant [1.] wrapper. FOCUS N1.N2.N3 walks the tree following each component and focuses the resolved leaf. A single integer (FOCUS k) still works (degree-1 path). The path must resolve to a leaf; selecting an internal frame yields "FOCUS: path must select a leaf goal, not a frame" and overshooting a leaf yields "FOCUS: path overshoots a leaf goal". After [split. split. split.] on [((a /\ b) /\ c) /\ d], TREE prints: [1.1.1] a = a <- focused [1.1.2] b = b [1.2] c = c [2] d = d and FOCUS 1.2 selects c, FOCUS 2 selects d, FOCUS 1.1.2 selects b. NEXT semantics unchanged. Flat proofs render unindented as before.

Add two workflow sections that were missing or stale: 4. Inspect and navigate nested subgoals with TREE and FOCUS Documents the dotted-path syntax (FOCUS 1.2.3), the [focus: k/N] reply tag, and the fact that TREE labels are dynamic (focus-first, not stable across focus changes). 5. Build a +strict_bullets-friendly proof with COMMIT Documents how COMMIT replays the recorded transcript and inserts bullets, plus the cycle (-, +, *, --, ++, **, ...) and the prior- bullet collision avoidance. Re-number the existing QUIET and SEARCH sections from 4/5 to 6/7. Fix one outdated pitfall ("subgoals must be closed in order") -- FOCUS path now lets the agent address them in any order. Reflects what's live in EcLlm; the protocol/meta-command table was already up to date.

strub mentioned this pull request May 27, 2026

[llm] add -trace LINE[:COL] for single-sentence before/after goals #1018

Closed

strub force-pushed the llm-interactive branch from 0f5f91b to 9f3da5b Compare May 27, 2026 14:20

strub added 11 commits May 28, 2026 12:05

strub force-pushed the llm-interactive branch from 9f3da5b to 7f13cec Compare May 31, 2026 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llm] playground for LLM-related interaction#1019

[llm] playground for LLM-related interaction#1019
strub wants to merge 11 commits into
mainfrom
llm-interactive

strub commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

strub commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant