Skip to content

[llm] playground for LLM-related interaction#1019

Draft
strub wants to merge 11 commits into
mainfrom
llm-interactive
Draft

[llm] playground for LLM-related interaction#1019
strub wants to merge 11 commits into
mainfrom
llm-interactive

Conversation

@strub
Copy link
Copy Markdown
Member

@strub strub commented May 27, 2026

Playground branch for experimenting with LLM-driven EasyCrypt interaction.

Not intended for review or merge in its current form — kept as a draft so
the work-in-progress is visible.

strub added 11 commits May 28, 2026 12:05
When `pragma +strict_bullets` is set, the bullet tokens `-`, `+`, `*`
(and their repeated forms `--`, `++`, `**`, ...) become Coq-style
focusing operators: each phrase's bullet is checked against a per-proof
stack so that a subgoal must be discharged before its sibling is
addressed, and so that bullet characters identify nesting levels. The
default behavior is unchanged: without the pragma, bullets remain pure
decoration, preserving every existing proof script.

Repeated bullet characters are emitted by the lexer as single
PLUSn/MINUSn/STARn tokens (carrying their literal), which makes them
usable both as deeper bullet levels and as user-defined binary
operators (`op (--) ...`).

The final sibling of a split point can be continued at the parent
level without a bullet: when the previous sibling is closed and
exactly one sibling remains, the inner frame pops automatically (and
cascades through nested last-sibling frames). This keeps the standard
phl idiom of long `seq N : <post>' chains flat instead of forcing
ever-deeper indentation under a `+' for each continuation. Multi-way
splits keep their enumeration discipline for all but the last
subgoal.
Introduce an interactive REPL for LLM coding agents driving EasyCrypt
(`easycrypt llm`) using a line-oriented protocol over stdin/stdout, plus
two CLI flags for goal inspection:

  - `-upto <pos>`  compile up to a given position and print the goals
  - `-lastgoals`   print the last unproven goals at end-of-file

REPL protocol (see `doc/llm/CLAUDE.md` for the full guide):

  - LOAD "file.ec" [LINE[:COL]] -- compile, optionally up to a position
  - UNDO / REVERT <uuid|name>   -- navigate proof state
  - GOALS / GOALS ALL           -- inspect current or all subgoals
  - CHECKPOINT <name>           -- named bookmarks for branching
  - SEARCH <pattern>            -- lemma search
  - QUIET ON/OFF                -- suppress goal display for bulk input
  - Direct EasyCrypt input (tactics, declarations, search, print, ...)

Responses use a typed envelope (OK/ERROR with uuid) terminated by an
`<END>` sentinel for reliable parsing. LOAD reports the last processed
line in the response tag; error messages include the offending source
text; only the current subgoal is shown by default with a remaining
count.
Add a -trace flag to the LOAD REPL command. When set, LOAD compiles the
prefix exactly as today (using the existing LINE[:COL] argument, or up
to EOF if omitted), but defers the last sentence and runs it under
goal capture, then returns a response body with four delimited blocks:

  === BEFORE: line L (col C) ===
  <focused goal before the sentence>
  === TACTIC (lines L:C - L':C') ===
  <exact source text of the sentence>
  === AFTER: line L (col C) ===
  <new focused goal + any new sibling goals>
  === SUMMARY ===
  open goals: N1 -> N2

Adapted from PR #1018 (-trace LINE[:COL] for batch mode): same
delimiters and the same new-or-modified-head filtering for AFTER. The
position is taken from LOAD's existing LINE[:COL] argument; the tag is
the regular [loaded:file:LINE].

If the deferred sentence is outside a proof context, or there is no
sentence to trace, the reply uses the ERROR envelope with a clear
message. If the sentence fails, the BEFORE/TACTIC blocks are still
delivered, AFTER carries a <sentence failed> marker, and the formatted
exception is appended.

Expose EcCommands.in_proof so the REPL can check the pre-execution
proof status without inspecting scope internals.
Three small additions to make REPL-driven proof exploration pleasant
without weakening +strict_bullets for saved scripts.

1. Bullet relaxation for REPL input. EcCommands.disable_repl_bullets
   is called at every REPL phrase; it drops pm_strict_bullets and
   clears puc_bullets on the active proof so REPL-typed tactics are
   not rejected for missing bullets. Files loaded via LOAD still
   respect their own pragma; only direct REPL input is relaxed.

2. TREE / TREE ALL meta-commands. List all open subgoals as a flat
   numbered enumeration with the focused goal marked. TREE shows a
   one-line conclusion per goal; TREE ALL shows the full goal bodies.
   Backed by EcCommands.pp_tree on top of EcCoreGoal.all_opened.

3. [focus: k/N] reply tag. When more than one subgoal is open, both
   tactic replies and the LOAD response carry [focus: 1/N] alongside
   any other tag, so the caller knows the next tactic targets goal #1
   of N.

Supporting plumbing: EcScope.set_xgoal exposes a way to swap the
active proof_uc without going through the tactic engine.
The REPL relies on EC's "first open goal is the focused one"
convention. Until now the only way to work on a non-first goal was to
discharge the earlier ones; the proving agent often wants to inspect
or skip a particular sibling without that.

Add two meta-commands:

  FOCUS N   rotate the open-goal list so the goal at index N (from
            TREE) becomes the focused one, preserving cyclic order.
  NEXT      shorthand for FOCUS 2 (rotate one step).

Backed by a new EcCoreGoal.rotate_focus that splits and recombines
pr_opened. Going through the tactic engine doesn't work for standalone
rotation: tcenv1_of_proof tc_down's the siblings out of view, so
Protate (the `first last` parsed form) has nothing to rotate at the
top level.

EcCommands.focus_goal wraps rotate_focus, applies it via the same
scope-mutation path disable_repl_bullets uses, and pushes a new
context so UNDO/REVERT can roll the change back.
REPL phrases are recorded with (uuid, source, parent_handle, opens),
where parent_handle is the focused goal right before the phrase ran.
COMMIT replays them against the proof DAG to recover the bullet
structure.

The DAG edge is now explicit. EcCoreGoal's proofenv carries a
pr_parent : handle ID.Map.t populated by FApi.newgoal (the single
choke-point where every child handle is created); EcCoreGoal exposes
children_of_handle / parent_of_handle on top of it. This avoids the
older approach of reading children out of g_validation, which only
worked for VIntros / VConv / VLConv / VRewrite / VExtern -- not for
VApply, whose subgoals are added to the tcenv state outside the
validation record.

Algorithm:
  - For each phrase, walk the subtree rooted at its parent handle,
    registering each multi-child split's children in [sibling_depth]
    at the right depth. Single-child links are continuations and do
    not bump depth.
  - To decide whether a phrase needs a bullet, walk upward via
    parent_of from its recorded parent until hitting a registered
    sibling ancestor; if found, emit the bullet for that depth and
    consume the registration.

Bullet tokens are chosen per depth from PR 1017's lexer order
(-, +, *, --, ++, **, ---, +++, *** ...), skipping any token already
in scope from the LOAD prefix's puc_bullets stack. The stack is
snapshotted at the moment REPL input takes over (the new return value
of disable_repl_bullets) so COMMIT can avoid token collisions with
frames opened by the prefix.

Tested patterns: simple split, nested split, case-split,
multi-tactic-per-sibling, compound first phrase (move=> hp hq;
split.), pHL seq N chain, list induction, have introducing a side
goal, [split; split.] producing 4 goals in one phrase, UNDO/REVERT
trimming, LOAD mid-proof continuation, and LOAD prefix already using
bullet tokens that COMMIT must avoid. All round-trip through
`ec.exe compile` under `pragma +strict_bullets`.
The LLM REPL accumulated ~870 lines of closures and refs inside
[main]'s body, intermixed with the unrelated compile/runtest/docgen
plumbing. Move it out into [src/ecLlm.ml] (with a one-line .mli
exposing just [val run]).

The implementation organises its closed-over state (notices buffer,
transcript, prior-bullets snapshot, checkpoints, quiet flag,
initialized flag) at the top of [run], then groups the helpers into
nested submodules so each concern is named:

  - Goals      goal/error formatting, focus tag, tree rendering
  - Wire       OK/ERROR/<END> envelope and replies
  - Transcript transcript trimming and clearing
  - Commit     bullet-token generator and DAG walk for COMMIT
  - Load       LOAD parser, prefix processor, and -trace block

[ec.ml] keeps the small [Llm] dispatch arm that calls
[EcLlm.run ~relocdir ~boot llmopts]. Pure move, no behavioural
change; smoke-tested with COMMIT, TREE, FOCUS, and -trace.
The main loop was a flat ~150-line if/else chain that mixed line
parsing (substrings, int_of_string, String.starts_with checks) with
the actions to take. Split into:

  module Parse: a [command] variant covering every accepted line shape
  (Quit, Help, Undo, Goals of [`One|`All], Tree of [`One|`All],
  Commit, Focus of int, Next, Checkpoint of string, Revert of string,
  Quiet of bool, Search of string, Load of string, Ec of string,
  Begin_multi, Done_multi, Multi_line of string, Blank), plus
  [of_line ~multi_active] which is a stateless string -> command,
  and [Parse_error] for argument-shape mistakes (e.g. "FOCUS foo").

  module Dispatch: a flat pattern match on the parsed command,
  delegating to small handlers (do_help, do_undo, do_focus_request,
  do_checkpoint, do_revert, do_quiet, do_search) and to the existing
  Load/Commit/Goals submodules. Holds the multi-line buffer state
  (multi_buf/in_multi) since Parse is pure.

The main loop becomes 9 lines: read a line, parse to a command,
dispatch, catch Parse_error and reply ERROR. Behaviour is unchanged;
manual smoke tests cover COMMIT, TREE, FOCUS (good and bad arg),
CHECKPOINT/REVERT, SEARCH, multi-line input, and the prior-bullets
collision case.
Before, lines like "CHECKPOINT" (no trailing argument) silently fell
through to EC input because the dispatcher used [String.starts_with
"CHECKPOINT "] -- requiring the trailing space that [String.strip]
on the input line had just removed. The user saw EC's generic "parse
error" instead of a command-specific message. Same shape applied to
LOAD, FOCUS, REVERT, and SEARCH.

Introduce a small [keyword_arg kw line] helper that accepts both
[line = kw] and [line = kw ^ " " ^ ...], returning the stripped
argument tail. Each prefix command now routes to its own parser even
when the argument is empty, and each parser produces a specific
[Parse_error] message:

  FOCUS      -> "FOCUS: missing argument"
  CHECKPOINT -> "CHECKPOINT: missing name"
  REVERT     -> "REVERT: missing uuid or checkpoint name"
  SEARCH     -> "SEARCH: missing query"
  LOAD       -> "LOAD: missing filename"  (Load.handle, which already
                                            had this branch but it
                                            was unreachable)
Introduce a frame tree derived from pr_opened + parent_of: each open
leaf's chain of multi-child ancestors (skipping single-child
continuations) becomes its path through the tree. The same data
structure backs both TREE rendering and FOCUS path lookup.

  TREE now shows depth-indented entries labelled with dotted paths
  matching what FOCUS accepts. Leading singleton frames are
  unwrapped: when all opens share an outermost split, the rendering
  starts at that split's branches, not at a redundant [1.] wrapper.

  FOCUS N1.N2.N3 walks the tree following each component and focuses
  the resolved leaf. A single integer (FOCUS k) still works (degree-1
  path). The path must resolve to a leaf; selecting an internal frame
  yields "FOCUS: path must select a leaf goal, not a frame" and
  overshooting a leaf yields "FOCUS: path overshoots a leaf goal".

After [split. split. split.] on [((a /\ b) /\ c) /\ d], TREE prints:

    [1.1.1] a = a <- focused
    [1.1.2] b = b
  [1.2] c = c
[2] d = d

and FOCUS 1.2 selects c, FOCUS 2 selects d, FOCUS 1.1.2 selects b.

NEXT semantics unchanged. Flat proofs render unindented as before.
Add two workflow sections that were missing or stale:

  4. Inspect and navigate nested subgoals with TREE and FOCUS
     Documents the dotted-path syntax (FOCUS 1.2.3), the [focus: k/N]
     reply tag, and the fact that TREE labels are dynamic
     (focus-first, not stable across focus changes).

  5. Build a +strict_bullets-friendly proof with COMMIT
     Documents how COMMIT replays the recorded transcript and inserts
     bullets, plus the cycle (-, +, *, --, ++, **, ...) and the prior-
     bullet collision avoidance.

Re-number the existing QUIET and SEARCH sections from 4/5 to 6/7.

Fix one outdated pitfall ("subgoals must be closed in order") --
FOCUS path now lets the agent address them in any order.

Reflects what's live in EcLlm; the protocol/meta-command table was
already up to date.
@strub strub force-pushed the llm-interactive branch from 9f3da5b to 7f13cec Compare May 31, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant