[llm] playground for LLM-related interaction#1019
Draft
strub wants to merge 11 commits into
Draft
Conversation
When `pragma +strict_bullets` is set, the bullet tokens `-`, `+`, `*` (and their repeated forms `--`, `++`, `**`, ...) become Coq-style focusing operators: each phrase's bullet is checked against a per-proof stack so that a subgoal must be discharged before its sibling is addressed, and so that bullet characters identify nesting levels. The default behavior is unchanged: without the pragma, bullets remain pure decoration, preserving every existing proof script. Repeated bullet characters are emitted by the lexer as single PLUSn/MINUSn/STARn tokens (carrying their literal), which makes them usable both as deeper bullet levels and as user-defined binary operators (`op (--) ...`). The final sibling of a split point can be continued at the parent level without a bullet: when the previous sibling is closed and exactly one sibling remains, the inner frame pops automatically (and cascades through nested last-sibling frames). This keeps the standard phl idiom of long `seq N : <post>' chains flat instead of forcing ever-deeper indentation under a `+' for each continuation. Multi-way splits keep their enumeration discipline for all but the last subgoal.
Introduce an interactive REPL for LLM coding agents driving EasyCrypt (`easycrypt llm`) using a line-oriented protocol over stdin/stdout, plus two CLI flags for goal inspection: - `-upto <pos>` compile up to a given position and print the goals - `-lastgoals` print the last unproven goals at end-of-file REPL protocol (see `doc/llm/CLAUDE.md` for the full guide): - LOAD "file.ec" [LINE[:COL]] -- compile, optionally up to a position - UNDO / REVERT <uuid|name> -- navigate proof state - GOALS / GOALS ALL -- inspect current or all subgoals - CHECKPOINT <name> -- named bookmarks for branching - SEARCH <pattern> -- lemma search - QUIET ON/OFF -- suppress goal display for bulk input - Direct EasyCrypt input (tactics, declarations, search, print, ...) Responses use a typed envelope (OK/ERROR with uuid) terminated by an `<END>` sentinel for reliable parsing. LOAD reports the last processed line in the response tag; error messages include the offending source text; only the current subgoal is shown by default with a remaining count.
Add a -trace flag to the LOAD REPL command. When set, LOAD compiles the prefix exactly as today (using the existing LINE[:COL] argument, or up to EOF if omitted), but defers the last sentence and runs it under goal capture, then returns a response body with four delimited blocks: === BEFORE: line L (col C) === <focused goal before the sentence> === TACTIC (lines L:C - L':C') === <exact source text of the sentence> === AFTER: line L (col C) === <new focused goal + any new sibling goals> === SUMMARY === open goals: N1 -> N2 Adapted from PR #1018 (-trace LINE[:COL] for batch mode): same delimiters and the same new-or-modified-head filtering for AFTER. The position is taken from LOAD's existing LINE[:COL] argument; the tag is the regular [loaded:file:LINE]. If the deferred sentence is outside a proof context, or there is no sentence to trace, the reply uses the ERROR envelope with a clear message. If the sentence fails, the BEFORE/TACTIC blocks are still delivered, AFTER carries a <sentence failed> marker, and the formatted exception is appended. Expose EcCommands.in_proof so the REPL can check the pre-execution proof status without inspecting scope internals.
Three small additions to make REPL-driven proof exploration pleasant without weakening +strict_bullets for saved scripts. 1. Bullet relaxation for REPL input. EcCommands.disable_repl_bullets is called at every REPL phrase; it drops pm_strict_bullets and clears puc_bullets on the active proof so REPL-typed tactics are not rejected for missing bullets. Files loaded via LOAD still respect their own pragma; only direct REPL input is relaxed. 2. TREE / TREE ALL meta-commands. List all open subgoals as a flat numbered enumeration with the focused goal marked. TREE shows a one-line conclusion per goal; TREE ALL shows the full goal bodies. Backed by EcCommands.pp_tree on top of EcCoreGoal.all_opened. 3. [focus: k/N] reply tag. When more than one subgoal is open, both tactic replies and the LOAD response carry [focus: 1/N] alongside any other tag, so the caller knows the next tactic targets goal #1 of N. Supporting plumbing: EcScope.set_xgoal exposes a way to swap the active proof_uc without going through the tactic engine.
The REPL relies on EC's "first open goal is the focused one"
convention. Until now the only way to work on a non-first goal was to
discharge the earlier ones; the proving agent often wants to inspect
or skip a particular sibling without that.
Add two meta-commands:
FOCUS N rotate the open-goal list so the goal at index N (from
TREE) becomes the focused one, preserving cyclic order.
NEXT shorthand for FOCUS 2 (rotate one step).
Backed by a new EcCoreGoal.rotate_focus that splits and recombines
pr_opened. Going through the tactic engine doesn't work for standalone
rotation: tcenv1_of_proof tc_down's the siblings out of view, so
Protate (the `first last` parsed form) has nothing to rotate at the
top level.
EcCommands.focus_goal wraps rotate_focus, applies it via the same
scope-mutation path disable_repl_bullets uses, and pushes a new
context so UNDO/REVERT can roll the change back.
REPL phrases are recorded with (uuid, source, parent_handle, opens),
where parent_handle is the focused goal right before the phrase ran.
COMMIT replays them against the proof DAG to recover the bullet
structure.
The DAG edge is now explicit. EcCoreGoal's proofenv carries a
pr_parent : handle ID.Map.t populated by FApi.newgoal (the single
choke-point where every child handle is created); EcCoreGoal exposes
children_of_handle / parent_of_handle on top of it. This avoids the
older approach of reading children out of g_validation, which only
worked for VIntros / VConv / VLConv / VRewrite / VExtern -- not for
VApply, whose subgoals are added to the tcenv state outside the
validation record.
Algorithm:
- For each phrase, walk the subtree rooted at its parent handle,
registering each multi-child split's children in [sibling_depth]
at the right depth. Single-child links are continuations and do
not bump depth.
- To decide whether a phrase needs a bullet, walk upward via
parent_of from its recorded parent until hitting a registered
sibling ancestor; if found, emit the bullet for that depth and
consume the registration.
Bullet tokens are chosen per depth from PR 1017's lexer order
(-, +, *, --, ++, **, ---, +++, *** ...), skipping any token already
in scope from the LOAD prefix's puc_bullets stack. The stack is
snapshotted at the moment REPL input takes over (the new return value
of disable_repl_bullets) so COMMIT can avoid token collisions with
frames opened by the prefix.
Tested patterns: simple split, nested split, case-split,
multi-tactic-per-sibling, compound first phrase (move=> hp hq;
split.), pHL seq N chain, list induction, have introducing a side
goal, [split; split.] producing 4 goals in one phrase, UNDO/REVERT
trimming, LOAD mid-proof continuation, and LOAD prefix already using
bullet tokens that COMMIT must avoid. All round-trip through
`ec.exe compile` under `pragma +strict_bullets`.
The LLM REPL accumulated ~870 lines of closures and refs inside [main]'s body, intermixed with the unrelated compile/runtest/docgen plumbing. Move it out into [src/ecLlm.ml] (with a one-line .mli exposing just [val run]). The implementation organises its closed-over state (notices buffer, transcript, prior-bullets snapshot, checkpoints, quiet flag, initialized flag) at the top of [run], then groups the helpers into nested submodules so each concern is named: - Goals goal/error formatting, focus tag, tree rendering - Wire OK/ERROR/<END> envelope and replies - Transcript transcript trimming and clearing - Commit bullet-token generator and DAG walk for COMMIT - Load LOAD parser, prefix processor, and -trace block [ec.ml] keeps the small [Llm] dispatch arm that calls [EcLlm.run ~relocdir ~boot llmopts]. Pure move, no behavioural change; smoke-tested with COMMIT, TREE, FOCUS, and -trace.
The main loop was a flat ~150-line if/else chain that mixed line parsing (substrings, int_of_string, String.starts_with checks) with the actions to take. Split into: module Parse: a [command] variant covering every accepted line shape (Quit, Help, Undo, Goals of [`One|`All], Tree of [`One|`All], Commit, Focus of int, Next, Checkpoint of string, Revert of string, Quiet of bool, Search of string, Load of string, Ec of string, Begin_multi, Done_multi, Multi_line of string, Blank), plus [of_line ~multi_active] which is a stateless string -> command, and [Parse_error] for argument-shape mistakes (e.g. "FOCUS foo"). module Dispatch: a flat pattern match on the parsed command, delegating to small handlers (do_help, do_undo, do_focus_request, do_checkpoint, do_revert, do_quiet, do_search) and to the existing Load/Commit/Goals submodules. Holds the multi-line buffer state (multi_buf/in_multi) since Parse is pure. The main loop becomes 9 lines: read a line, parse to a command, dispatch, catch Parse_error and reply ERROR. Behaviour is unchanged; manual smoke tests cover COMMIT, TREE, FOCUS (good and bad arg), CHECKPOINT/REVERT, SEARCH, multi-line input, and the prior-bullets collision case.
Before, lines like "CHECKPOINT" (no trailing argument) silently fell
through to EC input because the dispatcher used [String.starts_with
"CHECKPOINT "] -- requiring the trailing space that [String.strip]
on the input line had just removed. The user saw EC's generic "parse
error" instead of a command-specific message. Same shape applied to
LOAD, FOCUS, REVERT, and SEARCH.
Introduce a small [keyword_arg kw line] helper that accepts both
[line = kw] and [line = kw ^ " " ^ ...], returning the stripped
argument tail. Each prefix command now routes to its own parser even
when the argument is empty, and each parser produces a specific
[Parse_error] message:
FOCUS -> "FOCUS: missing argument"
CHECKPOINT -> "CHECKPOINT: missing name"
REVERT -> "REVERT: missing uuid or checkpoint name"
SEARCH -> "SEARCH: missing query"
LOAD -> "LOAD: missing filename" (Load.handle, which already
had this branch but it
was unreachable)
Introduce a frame tree derived from pr_opened + parent_of: each open
leaf's chain of multi-child ancestors (skipping single-child
continuations) becomes its path through the tree. The same data
structure backs both TREE rendering and FOCUS path lookup.
TREE now shows depth-indented entries labelled with dotted paths
matching what FOCUS accepts. Leading singleton frames are
unwrapped: when all opens share an outermost split, the rendering
starts at that split's branches, not at a redundant [1.] wrapper.
FOCUS N1.N2.N3 walks the tree following each component and focuses
the resolved leaf. A single integer (FOCUS k) still works (degree-1
path). The path must resolve to a leaf; selecting an internal frame
yields "FOCUS: path must select a leaf goal, not a frame" and
overshooting a leaf yields "FOCUS: path overshoots a leaf goal".
After [split. split. split.] on [((a /\ b) /\ c) /\ d], TREE prints:
[1.1.1] a = a <- focused
[1.1.2] b = b
[1.2] c = c
[2] d = d
and FOCUS 1.2 selects c, FOCUS 2 selects d, FOCUS 1.1.2 selects b.
NEXT semantics unchanged. Flat proofs render unindented as before.
Add two workflow sections that were missing or stale:
4. Inspect and navigate nested subgoals with TREE and FOCUS
Documents the dotted-path syntax (FOCUS 1.2.3), the [focus: k/N]
reply tag, and the fact that TREE labels are dynamic
(focus-first, not stable across focus changes).
5. Build a +strict_bullets-friendly proof with COMMIT
Documents how COMMIT replays the recorded transcript and inserts
bullets, plus the cycle (-, +, *, --, ++, **, ...) and the prior-
bullet collision avoidance.
Re-number the existing QUIET and SEARCH sections from 4/5 to 6/7.
Fix one outdated pitfall ("subgoals must be closed in order") --
FOCUS path now lets the agent address them in any order.
Reflects what's live in EcLlm; the protocol/meta-command table was
already up to date.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Playground branch for experimenting with LLM-driven EasyCrypt interaction.
Not intended for review or merge in its current form — kept as a draft so
the work-in-progress is visible.