feat(vm): add configurable :max_instructions instruction budget by davydog187 · Pull Request #320 · tv-labs/lua

davydog187 · 2026-06-01T17:18:39Z

VM instruction budget: configurable `:max_steps` with catchable exhaustion

Plan: .agents/plans/B17-vm-max-steps.md
Closes #306

Goal

Add a :max_steps option to Lua.new/1 that bounds the number of VM
instructions a single evaluation may execute, mirroring the existing
:max_call_depth:

Default :infinity — no limit, existing behavior unchanged, and the default path stays free of new per-instruction cost.
A positive integer caps total instructions executed. On exhaustion the VM raises a catchable Lua runtime error ("instruction budget exceeded") so pcall can recover, just like the "stack overflow" raised by :max_call_depth.

The bound applies to both execution paths: the interpreter (do_execute in lib/lua/vm/executor.ex) and the compiled dispatcher (dispatch in lib/lua/vm/dispatcher.ex).

Success criteria

Changes

 guides/examples/sandboxing.livemd |  40 +++
 lib/lua.ex                        |  34 ++-
 lib/lua/vm/dispatcher.ex          | 336 ++++++++++++++----------
 lib/lua/vm/executor.ex            | 536 +++++++++++++++++++++++++-------------
 lib/lua/vm/state.ex               |  35 ++-
 test/lua/vm/max_steps_test.exs    | new

The large executor/dispatcher line counts are dominated by threading the steps parameter through every do_execute/dispatch clause head and tail call (and the reflow of multi-line heads); the actual logic added is one increment + one check_steps!/2 at each loop back-edge and call boundary.

Design notes

check_steps!/2 (in Lua.VM.State) raises the same Lua.VM.RuntimeError used by "stack overflow", so pcall/xpcall trap it for free. Its :infinity clause resolves in a single function-head match with no struct rebuild.
The tally is threaded through the interpreter's do_execute/do_frame_return chain, so it spans frames within one evaluation (non-tail recursion stacks frames in the same do_execute chain — that is what bounds the recursion test). The dispatcher threads it through dispatch/finish_body/return_one/return_multi. The cross-module :compiled_closure/Dispatcher.execute and call_value hand-offs seed the callee with a fresh budget rather than changing the {results, state} return shape, which would otherwise ripple into out-of-scope stdlib modules; each compiled callee is bounded by the dispatcher's own counting.

Verification

mix format --check-formatted   # clean
mix compile --warnings-as-errors  # clean
mix test                       # 2126 passed, 19 skipped, 1 excluded
mix test --only lua53          # 17 passed, 12 skipped, 2117 excluded
mix test test/lua/vm/max_steps_test.exs        # 11 passed
mix test test/lua/vm/recursion_depth_test.exs  # 7 passed

Benchmark gate: the benchmark MIX env could not compile in this sandbox (luaport native dependency needs C Lua headers). By construction, the default :infinity path adds, per loop back-edge / call boundary only, one integer increment plus one State.check_steps!(state, steps) that short-circuits in a single function-head match on max_steps: :infinity returning :ok — no struct rebuild, no per-opcode cost. This is structurally identical to the existing check_call_depth!/1 already sitting on those same call boundaries, so no meaningful regression is expected on the default path.

Out of scope (intentional)

:max_alloc_bytes (deferred per the issue).
Per-instruction counting on every opcode.
Tail-call optimization / frame-push changes.
Wall-clock timeouts / max_heap_size.
Mid-run budget introspection or reset.

davydog187

Verdict: changes-requested — one real correctness bug (cross-eval budget leak) masked by a weak test, plus the mandated benchmark gate is unmet.

Automated review (skeptical senior pass). Findings tagged by severity with file:line and rationale. The hot-path and pcall-catchability verdicts are called out explicitly since they were the crux.

Hot-path / `:infinity` verdict: PASS (no per-opcode regression)

I traced every do_execute/dispatch clause. The running tally is threaded as a bare parameter (steps), never written into %State{} on the per-opcode path. All 16 steps = steps + 1 increments sit at the 4 interpreter back-edges + 2 interpreter call boundaries and the 4 dispatcher back-edges + 6 dispatcher call boundaries — none per opcode. State.check_steps!/2 resolves max_steps: :infinity in a single function-head match with no struct rebuild (state.ex:101). The new steps: field on %State{} (state.ex defstruct) is only assigned at engine-boundary struct rebuilds that already push a call frame (call_stack/call_depth), so it adds no per-opcode cost. This is structurally identical to the existing check_call_depth!/1 already on those boundaries. Good.

[nit] The plan's "byte-for-byte unchanged" framing for the :infinity path is slightly overstated: each back-edge/boundary still pays one unconditional steps + 1 and one check_steps!/2 call even when unbounded. Negligible and correct, but not literally zero — worth honest wording.

pcall-catchability verdict: PASS

check_steps!/2 raises Lua.VM.RuntimeError, value: "instruction budget exceeded" (state.ex), the exact type/shape lua_pcall rescues (stdlib.ex:209, [RuntimeError, AssertionError, TypeError]). It is genuinely catchable by Lua pcall/xpcall, identical to the existing "stack overflow". The catchability test (max_steps_test.exs "pcall catches the budget error…") asserts both {false, msg} and that the VM stays usable afterward — a real, correct assertion.

Findings

[blocker] Budget leaks across top-level evaluations; the "no cross-eval leak" test does not actually prove freshness. lib/lua/vm/executor.ex execute/5 seeds steps from state.steps (do_execute(..., 0, state.steps)), and the terminals stamp the final tally back via finish_steps/2 into state.steps. That final_state is returned by Lua.VM.execute and stored back into the %Lua{} by Lua.eval! (lib/lua.ex:496/538). Nothing resets state.steps to 0 at the top-level eval boundary — the only steps-to-0 is the defstruct default. So the budget is cumulative over the %Lua{} lifetime, not fresh per evaluation. This contradicts issue #306's acceptance ("a second Lua.eval!/2 … gets a fresh budget") and the PR's own doctest/criteria. The regression test (max_steps_test.exs "the budget is fresh per evaluation") uses max_steps: 5000 with two ~100-step evals; ~200 cumulative stays under 5000, so it passes whether or not the budget resets — it proves nothing about freshness. A correct test would pick a budget that one eval clears but two cumulatively exceed (e.g. max_steps ≈ 150, run the ~100-step eval twice, assert the second still succeeds). Fix: reset state.steps to 0 at the top-level evaluation entry (not at nested call_function/Dispatcher.execute hand-offs, which must keep seeding for cross-engine continuity), and harden the test.
[major] Mandated benchmark gate is unmet. Issue #306 makes "Benchmarked: no meaningful regression on the default :infinity path" a hard acceptance criterion, and the plan leaves it unchecked. The PR substitutes a by-construction argument because MIX_ENV=benchmark couldn't compile in the sandbox (luaport needs C Lua headers). The by-construction argument is sound and matches what I verified in the code, and CI is green — but green CI is not the benchmark, and the issue gated the change on recorded numbers. Run benchmarks/fibonacci.exs and benchmarks/dispatcher_vs_interpreter.exs on main vs this branch (default :infinity) on a host/CI with a compatible C Lua and record the numbers in the PR body before merge.
[minor] Commit-subject scope uses the plan id. Two commits use chore(B17): start plan and chore(B17): mark plan as review. Per CLAUDE.md / ship-a-plan, commit subjects use the affected subsystem as scope, never the plan id (the id belongs in the commit body, which the feature commits correctly do via Plan: B17). The feature commits (feat(vm): …) are correct; the two chore(B17) subjects are not.

Confirmed clean

Parity: both engines enforce the budget — interpreter back-edges/boundaries and dispatcher back-edges/all 6 call sites. The "dispatcher driven directly" and "cross-engine mutual recursion" tests cover the compiled path and the engine hand-off. Good.
Validation: validate_max_steps!/1 mirrors validate_max_call_depth!/1 (:infinity or pos_integer, else ArgumentError naming :max_steps); 0/-1/:nope covered by tests.
Empty-body while true do end is bounded: the back-edge increment fires per iteration regardless of body content; tested.
Docs: guides/examples/sandboxing.livemd gains a "Bounding CPU work" section (the tracked guide; guides/sandboxing.md doesn't exist on main — acceptable resolution).
No plan-id leakage into lib/, test/, or guides/; no AI co-author trailer; CI green across both Elixir/OTP matrices, Dialyzer, and the Lua 5.3 suite.

The :max_steps tally is stamped back into state.steps at each terminal and persisted into the returned %Lua{}, but nothing reset it at the top-level evaluation boundary. A long-lived %Lua{} running many small evals therefore accumulated steps across the whole lifetime and would eventually raise "instruction budget exceeded" even though no single eval came close — contradicting the per-eval contract in issue #306. Reset state.steps to 0 in Lua.VM.execute/2, the single chokepoint that Lua.eval!/eval route through. Nested calls within one evaluation still thread the tally as a bare parameter and accumulate against the same budget (a tight `while true do end` stays bounded); only the top-level boundary resets. The :infinity hot path is unchanged. Adds a regression test that sizes the budget just above one eval's real cost and runs that same eval 100x on the threaded state — red before this fix, green after — plus a guard asserting the budget still spans nested calls within a single evaluation. Addresses PR #320 review: cumulative-vs-per-eval budget leak. Plan: B17

davydog187 · 2026-06-01T19:36:05Z

Fix: instruction budget now resets per top-level evaluation

Addresses the review blocker that :max_steps was cumulative over the whole %Lua{} lifetime rather than fresh per evaluation. The tally is stamped back into state.steps at each terminal and persisted into the returned %Lua{}, but nothing reset it at the top-level boundary — so a long-lived %Lua{} running many small evals would eventually raise "instruction budget exceeded" even though no single eval came close (contradicting #306).

The fix (`d170301`)

One line at the single chokepoint both Lua.eval!/eval route through — Lua.VM.execute/2:

# Reset the instruction-budget tally at the top-level evaluation boundary
state = %{state | steps: 0}

One engine entry covers both engines. Lua.VM.execute/2 always frames the eval through the interpreter (Executor.execute/5); the compiled dispatcher is only entered for nested calls within an eval, which thread/seed from this reset state. So a single reset bounds the whole evaluation across both engines. I did not reset inside Executor.execute/5 / Dispatcher.execute — those are re-entered on nested calls and must keep accumulating.
Within-eval accumulation preserved. The praised design is intact: tally threaded as a bare parameter, increments only at loop back-edges + call boundaries, :infinity short-circuit on the hot path. A tight while true do end (even one calling a helper every iteration) is still bounded.

TDD

The previous "no cross-eval leak" test used max_steps: 5000 with two ~100-step evals (~200 cumulative) — it passed regardless of the leak. New tests in test/lua/vm/max_steps_test.exs:

"a budget sized for one eval survives repeating that same eval on the threaded state" — max_steps: 2000, runs a ~50-iteration eval 100× on the threaded state. Red on HEAD (cfadf03: raised "instruction budget exceeded" on an early iteration), green after the fix.
"the budget does NOT reset on nested calls within a single evaluation" — a while true do step(s) end loop still trips the budget, guarding against an over-broad reset.

Validation

mix format --check-formatted ✅
mix compile --warnings-as-errors ✅
mix test ✅ — 2130 passed, 19 skipped
mix test test/lua53_suite_test.exs --only lua53 ✅ — 17 passed, 12 skipped
mix test test/lua/vm/max_steps_test.exs ✅ — 15 passed (incl. the new red-then-green test)

Benchmark status (formal #306 gate: UNMET — honest disclosure)

The default-:infinity Benchee acceptance benchmark could not be run: MIX_ENV=benchmark mix run benchmarks/fibonacci.exs fails to compile the :luaport native dependency — c_src/luaport.c:14: fatal error: 'lua.h' file not found (missing native Lua 5.4 headers; LuaJIT also not on PKG_CONFIG_PATH). Benchee/statistex/Luerl themselves compiled fine; the blocker is the C port dep. The formal gate remains unmet in this environment.

As a best-effort substitute I took a :timer.tc micro-measurement of the :infinity hot path, with the one-line reset vs. without it (mix run, dev env, same machine):

workload	baseline (no reset)	with reset
`fib(28)` ×20	257.18 ms/eval	254.94 ms/eval
tiny `for`-loop eval ×100k	8.777 µs/eval	8.740 µs/eval

Both within run-to-run noise — the single %{state \| steps: 0} map update per top-level eval is not on the per-instruction path and shows no measurable cost.

Pushed as d170301.

The :max_steps tally is stamped back into state.steps at each terminal and persisted into the returned %Lua{}, but nothing reset it at the top-level evaluation boundary. A long-lived %Lua{} running many small evals therefore accumulated steps across the whole lifetime and would eventually raise "instruction budget exceeded" even though no single eval came close — contradicting the per-eval contract in issue #306. Reset state.steps to 0 in Lua.VM.execute/2, the single chokepoint that Lua.eval!/eval route through. Nested calls within one evaluation still thread the tally as a bare parameter and accumulate against the same budget (a tight `while true do end` stays bounded); only the top-level boundary resets. The :infinity hot path is unchanged. Adds a regression test that sizes the budget just above one eval's real cost and runs that same eval 100x on the threaded state — red before this fix, green after — plus a guard asserting the budget still spans nested calls within a single evaluation. Addresses PR #320 review: cumulative-vs-per-eval budget leak. Plan: B17

davydog187 · 2026-06-02T20:09:20Z

Benchmark results: default `:infinity` path — main vs branch

Ran the benchmark gate that couldn't run in the original sandbox (built luaport against Homebrew lua@5.4). Compared by swapping the 5 VM files to their merge-base (9dc141a) and back, so hardware/deps were identical — Apple M4, Elixir 1.20.0-rc.6, OTP 29. Medians reported (quick-mode means were noisy; medians are the robust stat).

`dispatcher_vs_interpreter` — fib(25), full mode (maximal Lua-closure call density)

job	baseline	branch	Δ median
dispatcher	76.90 ms	78.19 ms	+1.7%
interpreter	94.85 ms	97.36 ms	+2.6%

`table_ops` — quick mode, n=100, "chunk" path (pre-compiled, cleanest signal), n=2 each

workload	baseline	branch	Δ
Build (`table.insert` loop)	18.00–18.08 µs	19.29–19.54 µs	~+7%
Sort	30.13–30.63 µs	31.33–31.71 µs	~+3%
Iterate/Sum (`generic_for`)	25.00–25.21 µs	26.92–27.13 µs	~+8%
Map + Reduce	—	—	~0% (noisy)

The Build and Iterate bands are tight and non-overlapping across both samples, so the ~7-8% there is a real signal, not noise.

Interpretation

The "zero-cost by construction" framing holds for per-opcode cost but not for the default path overall:

Lua-closure calls (fib): only +1.7% — here the steps write is folded into the existing call_stack/call_depth struct rebuild, so the only added cost is the increment + the :infinity head-match.
Builtin-call and generic_for iterator boundaries: ~+7-8% — these got new state = %{state | steps: steps} … steps = state.steps round-trips that didn't exist on main, and they fire on the :infinity default too. That's why table.insert-per-iteration (Build) and pairs/ipairs-per-iteration (Iterate) regress most.

So: no per-instruction cost, but a measurable ~2-8% hit at call boundaries on the default path, concentrated where the PR added new struct round-trips rather than folding into an existing rebuild.

Suggested mitigation (author's call)

Guard the new round-trips on max_steps != :infinity at the builtin-call and generic_for sites only (the :lua_closure path is already cheap because it reuses the existing rebuild). That should bring the default path back to genuinely zero-cost. Happy to implement and re-benchmark if you'd like.

_{Methodology note: quick mode is the documented "did my change move the needle?" profile; dispatcher_vs_interpreter was run in full mode. luaport/luerl baseline rows omitted as irrelevant to the default-path regression question.}

The :max_steps tally is stamped back into state.steps at each terminal and persisted into the returned %Lua{}, but nothing reset it at the top-level evaluation boundary. A long-lived %Lua{} running many small evals therefore accumulated steps across the whole lifetime and would eventually raise "instruction budget exceeded" even though no single eval came close — contradicting the per-eval contract in issue #306. Reset state.steps to 0 in Lua.VM.execute/2, the single chokepoint that Lua.eval!/eval route through. Nested calls within one evaluation still thread the tally as a bare parameter and accumulate against the same budget (a tight `while true do end` stays bounded); only the top-level boundary resets. The :infinity hot path is unchanged. Adds a regression test that sizes the budget just above one eval's real cost and runs that same eval 100x on the threaded state — red before this fix, green after — plus a guard asserting the budget still spans nested calls within a single evaluation. Addresses PR #320 review: cumulative-vs-per-eval budget leak. Plan: B17

davydog187 · 2026-06-15T13:32:11Z

🤖 Automated multi-agent review — stabilization cycle (post-rebase)

This branch was rebased onto current main and the conflicts resolved by hand (main had rewritten executor.ex/dispatcher.ex/state.ex under it — #360 register-reclaim/lazy-frames, #362 dead-:set_global removal, etc.). CI is green on the merge ref, and the local suite passes (2491 tests, lua53 17, format clean, dialyzer green).

5 reviewers — including a dedicated merge-integrity dimension — fanned out, each finding then adversarially verified by an independent agent (several ran live repros). Raised 7 → 3 confirmed (all MEDIUM), 4 refuted.

Merge integrity: clean. :set_global stays deleted, call_info stays the tuple form, concat_checked/3 + max_string_bytes coexist with max_steps, the try/rescue + open-upvalue wrappers are intact, and check_steps! raises with the state: field. The one merge-dimension finding below was verified to be byte-identical to the pre-rebase branch tip — i.e. pre-existing, not introduced by the rebase.

All three findings are budget-accounting issues. None defeats the cap's primary guarantee — every repro still terminates (verified by multiple independent runs); the issue is that the bound is non-conservative, so total work can exceed :max_steps by a factor before the eval finally trips.

🟠 MEDIUM — `return ...` interpreter terminal drops the tally across an engine boundary

lib/lua/vm/executor.ex:1371. Every interpreter terminal stamps the running tally back into state via finish_steps(state, steps) (lines 790, 1393, 1407, 1428, 2301, 2354). The {:return_vararg} clause is the lone outlier:

case frames do
  [] -> {varargs, regs, state}                    # ← stale state.steps; tally dropped
  [frame | rest] -> do_frame_return(varargs, regs, state, frame, rest, line, steps)
end

{:return_vararg} is emitted for return ... (codegen.ex:207). Cross-engine seeding reads steps = state.steps after an interpreted callee returns, so a compiled caller repeatedly invoking an interpreted return ... passthrough accumulates nothing for that callee's work. Verified: identical loop work reports state.steps == 9 ending in return s but == 0 ending in return .... The dispatcher's same-shape terminal (return_multi at dispatcher.ex:~1569) stamps correctly — this clause should too.

Fix (one line, follows the established pattern):

[] -> {varargs, regs, finish_steps(state, steps)}

The cross-engine recursion tests don't catch it because they use return pong(n) (a return f(...) call-forward), never a bare return ....

🟠 MEDIUM — `pcall` "refunds" the instructions a caught budget error consumed

State.unwind_to/2 (state.ex) carries only heap fields from the raise-time state; it keeps entry.steps. So when a protected call trips the budget, recovery reverts steps to the pcall-entry tally and the inner work is forgiven. Net effect for for i=1,N do pcall(function() while true do end end) end: ~O(N²) total work instead of O(N).

Important nuance (reconciling the reviewers): the verifier that confirmed this and the two that refuted it agree on the mechanism and that the eval still terminates — Lua.new(max_steps: 5000) + the loop-of-pcall pattern raises "instruction budget exceeded" in practice (the outer loop's own back-edges climb monotonically to the cap). So this is not a cap-defeat / unbounded escape; it's that the advertised linear bound is quadratically loose under pcall-wrapped work. The DoS-hardening guarantee (termination) holds.

Fix direction: make steps a monotonic counter that survives unwinding (e.g. carry max(entry.steps, raised.steps) in unwind_to/2). Note one subtlety the verifier flagged: a callless while true do end never stamps its elevated local tally into state.steps before raising, so the loop back-edge (or check_steps!) would also need to surface the live tally for the fix to fully close the pure-loop case. Worth a regression test: a pcall(heavy) loop under a fixed budget must still trip the cap.

🟠 MEDIUM — path-based `require` resets the budget mid-evaluation

Lua.VM.execute/2 unconditionally runs state = %{state | steps: 0} (vm.ex:~34), documented as the top-level boundary — but it's also re-entered mid-eval by Stdlib.parse_and_execute_module/4 when running code calls require(mod) (stdlib.ex:765). Each path-based require of a fresh module therefore zeroes the running tally, forgiving the outer eval's pre-require work and letting the module body run against a fresh budget. Blast radius is bounded (preload modules go through call_function and don't reset; package.loaded caching means it's one forgiveness per distinct module loaded), hence MEDIUM.

Fix direction: gate the reset so it fires only at genuine top-level entry (a reset_steps? flag, with parse_and_execute_module/load passing false so the module body accumulates against the inherited budget).

4 findings refuted (with reasoning)

Two separate "pcall defeats the budget → unbounded" claims — refuted by direct execution: the loop-of-pcall and while true do pcall(while true do end) end patterns both raise the budget error (bounded). Reduced to the looseness captured above (not an escape).
"No test that the budget survives pcall recovery" — the suggested probe was run and passes on the branch (consumed budget persists across recovery within one eval); redundant nice-to-have, not a gap masking a defect.
"Comment overstates pcall accumulation" — the comment annotates the seed-in direction (state.steps = steps before the native call), which is unconditionally accurate; the finding was purely derivative of the unwind_to behavior above.

Bottom line: the rebase is sound and CI-green; the feature works and terminates as a sandboxing primitive. The three findings are accounting-conservativeness gaps that let a single eval overshoot the advertised instruction ceiling (never run unbounded). The return ... one-liner is a safe immediate fix; the pcall and require ones are small design decisions about whether the budget should be strictly monotonic.

I can apply the return_vararg fix (+ regression test) right away, and/or take the other two — just say which.

_{Generated by an automated multi-agent review+verify workflow. Findings verified against the rebased branch tip; treat as review signal, not ground truth.}

…eturn ... Three accounting holes let a single evaluation overshoot the advertised instruction ceiling (all surfaced by an automated review of #320): - The `{:return_vararg}` interpreter terminal returned a bare state at the bottom frame instead of `finish_steps(state, steps)`, so work done in an interpreted `return ...` callee invoked across an engine boundary vanished from the budget. It now stamps the tally like every sibling terminal. - `pcall` recovery rewound the tally: `State.unwind_to/2` kept the pcall-entry `steps`, so a caught budget error refunded the work it burned (O(N^2) total work for a pcall-in-a-loop). `check_steps!/2` now stamps the live tally into the raised `state:`, and `unwind_to/2` carries `steps` forward (monotonic max), so a caught budget error stays spent. - `require` reset the budget mid-evaluation: it re-enters `Lua.VM.execute`, whose per-eval `steps: 0` reset forgave the outer eval's pre-require work. `execute/3` now takes `reset_steps:` (default true); `require` passes false so a module body counts against the same per-evaluation budget. Adds regression tests for all three (interpreted `return ...` tally stamp, pcall-loop trips the cap, require preserves the budget) plus a small module fixture. Full suite 2495 passed, lua53 17, format clean.

davydog187 · 2026-06-15T14:08:55Z

🤖 Benchmark follow-up — `:max_steps` default-path overhead

The PR couldn't run benchmarks in its original sandbox (luaport needs C Lua headers). I ran the gap here: benchmarks/fibonacci.exs (fib(30), recursion-heavy → exercises the per-call-boundary step increment) on main vs this branch, serially, 3 pairs, with Luerl as a drift control (machine noise normalizer). luaport was unavailable so C-Lua is skipped; the lua (chunk) precompiled path is the clean VM measure.

Result: small but real ~3% regression on the default `:infinity` path

Drift-corrected (each run's lua (chunk) normalized to that run's Luerl, then branch÷main):

run	main chunk/luerl	branch chunk/luerl	branch vs main
1	1.035	1.068	+3.2%
2	~1.03 (noisy¹)	1.094	+2.5–6%
3	1.045	1.069	+2.3%

Direction is consistent across all three pairs (branch always slower relative to Luerl); within-run deviation was <1% on the stable runs. Net: fib(30) is ~3% slower at the default :infinity.

¹ run-2 main chunk had ±7.76% deviation (outliers); its median tracked the other runs.

Why — and a note on the PR's framing

This is inherent to the feature, not the review-fix commit (those touch the return ... terminal, unwind_to, and require — none on the fib hot path). The cost is that, even at :infinity, every call boundary and loop back-edge still does an unconditional steps = steps + 1 and a State.check_steps!/2 call, and the steps tally is threaded as an extra parameter through the whole do_execute/dispatch chain (register pressure). The PR's "the default :infinity carries no per-instruction cost" is therefore slightly optimistic: the check short-circuits for free, but the increment + the function call + the threaded param do not.

Options (your call — design decision, not something I'd change unasked)

Accept it. ~3% on pure recursion for an opt-in sandboxing/DoS primitive is defensible; loop/string/table-heavy workloads (less call-dense) will see less.
Make it truly zero-cost at :infinity — bigger change: e.g. skip the increment when max_steps == :infinity, or select a non-counting do_execute/dispatch path when no budget is set. Worth it only if the default path is performance-critical.

Methodology mirrors the repo's perf convention (serial benches, Luerl drift control). Happy to re-run on a loop-heavy or string/table benchmark, or to prototype the zero-cost-at-:infinity path if you want to keep the default free.

_{Automated benchmark run; Apple M-series, 3 serial pairs. Absolute ms vary with thermal state — the Luerl-normalized ratio is the signal.}

davydog187 · 2026-06-15T14:40:18Z

🤖 Perf follow-up — `State.tick!` landed (`9511f50`), but the regression mostly persists

Folded the per-boundary steps = steps + 1 + check_steps!/2 into one State.tick!/2 that is a true no-op at :infinity (returns the tally unchanged in one function-head match; finite clause increments + raises with identical raise-time state: semantics). Correctness-clean: mix test 2495, max_steps 19, lua53 17, format clean, Dialyzer clean, CI green.

But re-benchmarking (fib(30), 4 serial pairs, drift-corrected lua(chunk)/luerl) shows the regression is NOT closed:

pair	main ratio	candidate ratio	residual
1	1.048	1.097	+4.7%
2	1.055	1.128	+7.0%
3	1.035	1.119	+8.1%
4	1.059	1.132	+6.9%

Drift-corrected residual ≈ +5–7% vs main (my earlier ~3% reading was a lucky-low sample; the true figure is noisier and higher). tick! removes the per-op arithmetic but the residual is dominated by what it can't remove: (1) one function call per call-boundary/back-edge (~2.7M for fib(30)), and (2) the steps parameter still threaded through every do_execute/dispatch clause (extra arg → register pressure). So this is a clean code simplification with no reliable hot-path speedup.

Truly closing it needs a bigger change — a non-counting execution path when no budget is set (don't thread/charge steps at :infinity at all), e.g. per-engine inlined tick! + skipping the threaded tally. That's a larger, riskier refactor of a hot path that was just rewritten in #360, so I'm holding for a call on whether it's worth it.

Scope reminder: this only bites call-/loop-dense code (pure recursion is the worst case); :max_steps is opt-in and the default stays correct.

_{Automated benchmark, Apple M-series, drift-corrected. tick! kept regardless as a clarity/cleanup win.}

davydog187 · 2026-06-15T15:43:54Z

🔤 Renamed `:max_steps` → `:max_instructions` (`cc3e930`)

Per discussion: "steps" was vocabulary this PR introduced, while the rest of the VM — and this feature's own "instruction budget exceeded" error — speaks in instructions. Renamed for consistency with the error message and the :max_call_depth / :max_string_bytes sibling options.

Public option + %State{} ceiling field → max_instructions (incl. validate_max_instructions!, docs, README, CHANGELOG, guide, doctest).
The running tally (the %State{} field + the value threaded through every do_execute/dispatch clause) → instruction_count, not instructions, because instructions is already the instruction-list parameter on those same clauses and would shadow it.
Test module/file → MaxInstructionsTest / max_instructions_test.exs.

Pure rename, no behavior change: suite 2495, lua53 17, format + --warnings-as-errors + Dialyzer all clean. PUC suite files under test/lua53_tests/ (which use "steps" for GC steps) were left untouched. The branch name keeps its old slug to avoid disrupting the PR; only the option/API vocabulary changed.

The PR description above still says :max_steps in places — the durable contract is the code; flagging rather than rewriting the original scoping notes.

Adds a `:max_steps` option to `Lua.new/1` mirroring `:max_call_depth`: default `:infinity` (no limit, existing behavior unchanged), a positive integer caps the VM instructions a single evaluation may execute, and exhaustion raises a catchable `"instruction budget exceeded"` runtime error recoverable via `pcall`. This gives library consumers a deterministic CPU bound without wrapping each call in a host Task and wall-clock timeout. The running tally is threaded as a parameter through the interpreter's `do_execute` chain and the compiled dispatcher's `dispatch` chain — not stored in `%State{}` — preserving the executor's `line`-off-State discipline so the default `:infinity` path carries no per-instruction cost. The counter is incremented only at loop back-edges and call boundaries; `check_steps!/2` short-circuits on `:infinity` in a single function-head match. Both execution paths enforce the budget. Plan: B17 Closes #306

Make the :max_steps instruction budget durable across Executor<->Dispatcher engine hand-offs so recursion that alternates execution engines is bounded rather than resetting its budget at each boundary. The running tally now rides through a `steps` field on %State{} at engine boundaries only (where the struct is already rebuilt to push a call frame), never per opcode: the crossing engine writes its threaded tally into state.steps and the entered engine seeds from it, stamping the final tally back at its terminal. This closes the gap between max_call_depth: :infinity and a deterministic CPU bound for a compiled/interpreted mutually-recursive pair with no loop on either side. Adds regression coverage in test/lua/vm/max_steps_test.exs: a goto-bearing interpreted closure and a plain compiled closure in unbounded mutual recursion trip the budget, plus a guard asserting the pair is genuinely split across both engines. Plan: B17

The :max_steps tally is stamped back into state.steps at each terminal and persisted into the returned %Lua{}, but nothing reset it at the top-level evaluation boundary. A long-lived %Lua{} running many small evals therefore accumulated steps across the whole lifetime and would eventually raise "instruction budget exceeded" even though no single eval came close — contradicting the per-eval contract in issue #306. Reset state.steps to 0 in Lua.VM.execute/2, the single chokepoint that Lua.eval!/eval route through. Nested calls within one evaluation still thread the tally as a bare parameter and accumulate against the same budget (a tight `while true do end` stays bounded); only the top-level boundary resets. The :infinity hot path is unchanged. Adds a regression test that sizes the budget just above one eval's real cost and runs that same eval 100x on the threaded state — red before this fix, green after — plus a guard asserting the budget still spans nested calls within a single evaluation. Addresses PR #320 review: cumulative-vs-per-eval budget leak. Plan: B17

Add a CHANGELOG Unreleased entry and a README "Resource limits" subsection covering :max_call_depth and :max_steps. The README block is inside the moduledoc delimiter, so its iex> example is doctested. Plan: B17

…eturn ... Three accounting holes let a single evaluation overshoot the advertised instruction ceiling (all surfaced by an automated review of #320): - The `{:return_vararg}` interpreter terminal returned a bare state at the bottom frame instead of `finish_steps(state, steps)`, so work done in an interpreted `return ...` callee invoked across an engine boundary vanished from the budget. It now stamps the tally like every sibling terminal. - `pcall` recovery rewound the tally: `State.unwind_to/2` kept the pcall-entry `steps`, so a caught budget error refunded the work it burned (O(N^2) total work for a pcall-in-a-loop). `check_steps!/2` now stamps the live tally into the raised `state:`, and `unwind_to/2` carries `steps` forward (monotonic max), so a caught budget error stays spent. - `require` reset the budget mid-evaluation: it re-enters `Lua.VM.execute`, whose per-eval `steps: 0` reset forgave the outer eval's pre-require work. `execute/3` now takes `reset_steps:` (default true); `require` passes false so a module body counts against the same per-evaluation budget. Adds regression tests for all three (interpreted `return ...` tally stamp, pcall-loop trips the cap, require preserves the budget) plus a small module fixture. Full suite 2495 passed, lua53 17, format clean.

The budget tally was charged at every call boundary and loop back-edge with two operations — an unconditional `steps = steps + 1` increment followed by `State.check_steps!/2`. At the default `:infinity` budget the *check* short-circuited in one function-head match, but the increment and the second function call still ran on the recursion-heavy hot path (fib(30) ≈ 2.7M call boundaries), costing a few percent over main. Fold both into a single `State.tick!/2` that is a true no-op at `:infinity`: it returns the threaded tally unchanged in one function-head match, doing no arithmetic and rebuilding no struct. The finite clause increments and raises `"instruction budget exceeded"` once the new tally reaches `max_steps`, with the same raise-time `state:` semantics the old `check_steps!/2` carried, so pcall/require/unwind accounting is unchanged. `check_steps!/2` is removed (fully subsumed). Because `:infinity` no longer accrues a tally, the one regression test that inspected `state.steps` at the default budget now seeds a generous finite budget so the `return ...` terminal-stamping invariant it guards stays observable. Plan: B17

"steps" was vocabulary the budget feature introduced; the rest of the VM (and the feature's own "instruction budget exceeded" error) speaks in "instructions". Rename the public option and the %State{} ceiling field to :max_instructions / max_instructions, matching the :max_call_depth / :max_string_bytes sibling options and the error message. The running tally (the %State{} field and the value threaded through every do_execute/dispatch clause) becomes `instruction_count` rather than `instructions`, because `instructions` is already the instruction-list parameter on those same clauses and would shadow it. Pure rename: no behavior change. Test module/file renamed to MaxInstructionsTest / max_instructions_test.exs. Suite 2495, lua53 17, format + warnings-as-errors clean. The PUC suite files under test/lua53_tests/ (which use "steps" for GC steps) are untouched.

davydog187 commented Jun 1, 2026

View reviewed changes

davydog187 force-pushed the feat/vm-max-steps branch from d170301 to 12b158f Compare June 1, 2026 22:38

davydog187 force-pushed the feat/vm-max-steps branch from 12b158f to 1ef7a38 Compare June 2, 2026 19:36

davydog187 force-pushed the feat/vm-max-steps branch from 37f2f46 to c060163 Compare June 15, 2026 13:19

davydog187 changed the title ~~feat(vm): add configurable :max_steps instruction budget~~ feat(vm): add configurable :max_instructions instruction budget Jun 15, 2026

davydog187 added 8 commits June 15, 2026 09:05

chore(B17): start plan

376d503

chore(B17): mark plan as review

cfc0db6

docs(changelog,readme): document :max_steps instruction budget

3f94e95

Add a CHANGELOG Unreleased entry and a README "Resource limits" subsection covering :max_call_depth and :max_steps. The README block is inside the moduledoc delimiter, so its iex> example is doctested. Plan: B17

davydog187 force-pushed the feat/vm-max-steps branch from cc3e930 to 9063636 Compare June 15, 2026 16:11

davydog187 force-pushed the feat/vm-max-steps branch from 9063636 to 491ce3f Compare June 15, 2026 16:13

davydog187 merged commit 7bbb824 into main Jun 15, 2026
5 checks passed

davydog187 deleted the feat/vm-max-steps branch June 15, 2026 16:44

davydog187 mentioned this pull request Jun 15, 2026

chore(release): 1.0.0-rc.3 #367

Merged

davydog187 mentioned this pull request Jun 15, 2026

Add execution quotas #77

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vm): add configurable :max_instructions instruction budget#320

feat(vm): add configurable :max_instructions instruction budget#320
davydog187 merged 9 commits into
mainfrom
feat/vm-max-steps

davydog187 commented Jun 1, 2026

Uh oh!

davydog187 left a comment

Uh oh!

davydog187 commented Jun 1, 2026

Uh oh!

davydog187 commented Jun 2, 2026

Uh oh!

davydog187 commented Jun 15, 2026

Uh oh!

davydog187 commented Jun 15, 2026

Uh oh!

davydog187 commented Jun 15, 2026

Uh oh!

davydog187 commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davydog187 commented Jun 1, 2026

VM instruction budget: configurable :max_steps with catchable exhaustion

Goal

Success criteria

Changes

Design notes

Verification

Out of scope (intentional)

Uh oh!

davydog187 left a comment

Choose a reason for hiding this comment

Hot-path / :infinity verdict: PASS (no per-opcode regression)

pcall-catchability verdict: PASS

Findings

Confirmed clean

Uh oh!

davydog187 commented Jun 1, 2026

Fix: instruction budget now resets per top-level evaluation

The fix (d170301)

TDD

Validation

Benchmark status (formal #306 gate: UNMET — honest disclosure)

Uh oh!

davydog187 commented Jun 2, 2026

Benchmark results: default :infinity path — main vs branch

dispatcher_vs_interpreter — fib(25), full mode (maximal Lua-closure call density)

table_ops — quick mode, n=100, "chunk" path (pre-compiled, cleanest signal), n=2 each

Interpretation

Suggested mitigation (author's call)

Uh oh!

davydog187 commented Jun 15, 2026

🤖 Automated multi-agent review — stabilization cycle (post-rebase)

🟠 MEDIUM — return ... interpreter terminal drops the tally across an engine boundary

🟠 MEDIUM — pcall "refunds" the instructions a caught budget error consumed

🟠 MEDIUM — path-based require resets the budget mid-evaluation

Uh oh!

davydog187 commented Jun 15, 2026

🤖 Benchmark follow-up — :max_steps default-path overhead

Result: small but real ~3% regression on the default :infinity path

Why — and a note on the PR's framing

Options (your call — design decision, not something I'd change unasked)

Uh oh!

davydog187 commented Jun 15, 2026

🤖 Perf follow-up — State.tick! landed (9511f50), but the regression mostly persists

Uh oh!

davydog187 commented Jun 15, 2026

🔤 Renamed :max_steps → :max_instructions (cc3e930)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

VM instruction budget: configurable `:max_steps` with catchable exhaustion

Hot-path / `:infinity` verdict: PASS (no per-opcode regression)

The fix (`d170301`)

Benchmark results: default `:infinity` path — main vs branch

`dispatcher_vs_interpreter` — fib(25), full mode (maximal Lua-closure call density)

`table_ops` — quick mode, n=100, "chunk" path (pre-compiled, cleanest signal), n=2 each

🟠 MEDIUM — `return ...` interpreter terminal drops the tally across an engine boundary

🟠 MEDIUM — `pcall` "refunds" the instructions a caught budget error consumed

🟠 MEDIUM — path-based `require` resets the budget mid-evaluation

🤖 Benchmark follow-up — `:max_steps` default-path overhead

Result: small but real ~3% regression on the default `:infinity` path

🤖 Perf follow-up — `State.tick!` landed (`9511f50`), but the regression mostly persists

🔤 Renamed `:max_steps` → `:max_instructions` (`cc3e930`)