compiler speed optimizations

when we get #369 we will get much larger contracts like zero-knowledge proof verifiers with maybe 2-3 orders of magnitude more cashscript code to compile which will surface optimized compiler speeds

first thing i ran into when analyzing with claude:

> `compileString` runs the bytecode optimizer twice (`optimiseBytecodeOld` and `optimiseBytecode`) and then compares them by
>   converting both to ASM strings `(scriptToAsm(...) !== scriptToAsm(...))` : a dev sanity-check left in the hot path plus source-map/debug/artifact generation we discard.

then in follow up research claude said: 
> The optimizer is 98% of the time (932 ms of 952 ms) — parse/codegen/typecheck are noise. So 1.3–1.6× is not the limit; the real target is inside
>   `optimiseBytecode`.
> 
> `replaceOps` (line 285+) doesn't work on the script array — it stringifies to ASM and does regex rewriting, and for every match it counts whitespace over the growing `processedAsm ` (`[...processedAsm.matchAll(/\s+/g)].length`) to recover the script index. That's **O(n²) per optimization**, per match, per pass — string work, not the actual rewriting. Let me confirm it's quadratic by measuring optimize-time vs contract size.

follow up:
> it's not O(n²) in op-count — it's driven by ASM-string length. The pairing chunks bake dozens of 32–40-byte BN254 field constants; each becomes a long hex run in the ASM string. And replaceOps works on that string: for every optimization pattern, every match, it counts whitespace over the growing processedAsm ([...processedAsm.matchAll(/\s+/g)].length) to recover the script index — O(asm-length) per match, so the cost explodes with the size of the baked constants, not the number of ops.

claude concluded and suggested:
> The optimizer is 98% of compile time, and it's a string/regex rewriter that scales with ASM-string length. The double-optimize removal you already did is the small lever; the big lever is the optimizer's design. A replaceOps that works on the script array (match opcode/data-element sequences directly, splice the array, keep a parallel index map for the location/source-map data) would skip the stringify-and-regex entirely — and would speed up both paths (full and lean), since both call optimiseBytecode. Given the per-op cost is ~10× higher on the constant-heavy pairing chunks, that's plausibly the multi-×  ("5×"-ish) win you're asking about, and it grows with how many big field constants a contract bakes.
> 
> Cheaper, lower-risk wins (each speeds up both paths):
>  - Array fixed-point check — replace scriptToAsm(old) === scriptToAsm(new) with an element-wise array compare (no stringify per pass).
>  - Hoist per-pattern work — the regex objects and pattern/replacement length counts are recomputed per match; compute once.
> 
> But those are incremental. The real headroom is killing the ASM-string rewriting in replaceOps in favor of array-based matching

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler speed optimizations #411

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

compiler speed optimizations #411

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions