Add batch mode for JSONL function runs#589
Conversation
Function-runner parses JSON inputs into serde_json::Value and reserializes them before passing bytes to the WASM. With serde_json's default map implementation, this sorts object keys lexicographically, which can change JS behavior that depends on Object.keys() ordering for fallback logic.\n\nEnable serde_json's preserve_order feature so JSON input object key order is retained, and add a regression test covering nested metafield-like message objects. This keeps function-rerunner parity closer to production inputs for Checkout Blocks discount functions.\n\nVerified with:\n- cargo test json_input_preserves_object_key_order_in_raw_bytes\n- cargo build --release\n- cargo test\n- Rerunning the 17 discount localized-message mismatch rows with the fixed release runner produced full semantic matches. Assisted-By: devx/c659e918-9568-4750-b122-e3890447348a
|
@davejcameron I created this draft PR to at least start the conversation about merging these changes into main. Is there any reason not to pursue that? |
Make batch mode continue by default and add --batch-fail-on-error for callers that want fail-fast behavior. Count actual function successes and failures based on FunctionRunResult.success instead of treating every successful runner invocation as a successful function run.\n\nAdd integration coverage for default continue behavior, fail-fast behavior, and accurate batch summaries. Also update integration tests to use assert_cmd::cargo::cargo_bin! instead of the deprecated Command::cargo_bin helper.\n\nVerified with:\n- cargo test batch_\n- cargo test Assisted-By: devx/c659e918-9568-4750-b122-e3890447348a
Batch mode is intended to process large JSONL input sets efficiently. Javy/provider functions were still compiling the embedded standard provider module for every input row, so provider setup dominated runtime even though the function module itself was reused.\n\nCompile the standard provider once before the batch loop and pass the compiled provider into each row execution. IOHandler now instantiates the precompiled provider module when it matches the function's standard import, falling back to the old Module::from_binary path otherwise.\n\nAdd batch coverage for a Javy v3 function to exercise the provider path.\n\nMeasured locally on 5,000 js_function_javy_plugin_v3 rows:\n- Before: median 52.31s\n- After: median 0.20s warm run\n\nMeasured on discount-order's 250,003-row parity dataset:\n- Before: 549.85s, 454.68 rows/sec\n- After: 119.87s, 2085.58 rows/sec\n\nVerified with:\n- cargo test batch_\n- cargo test Assisted-By: devx/c659e918-9568-4750-b122-e3890447348a
|
It makes sense to me to add a mode where we batch a bunch of inputs and write a series of outputs. The code changes seem a bit messier though. I think we could update the You can also open a separate PR to add |
|
@jeffcharles Yep, code definitely needs to be cleaned up and split into distinct changes. It's a bit of a dumping ground at the moment. |
Refs shop/issues-checkout#13420
Why
Checkout Blocks parity testing needs to replay large production datasets efficiently and faithfully. Migration validation often requires rerunning hundreds of thousands of function inputs, and invoking
function-runneronce per row spends most of the time on repeated process startup plus Wasmtime engine/module setup.This branch adds a JSONL batch execution mode so callers like
function-rerunnercan stream many inputs through one runner process while reusing the loaded function module.The branch also includes a JSON input fidelity fix discovered while using batch reruns for Checkout Blocks discount parity:
function-runnerwas sorting JSON object keys during parse/reserialize. That changed object-order-sensitive JS fallback behavior and produced false localized-message mismatches.What
Batch execution
--batchmode.--inputor piped stdin.--batch-continue-on-errorfor best-effort processing when individual lines fail.JSON input fidelity
serde_json'spreserve_orderfeature so JSON object key order survives input normalization.Cargo.lockfor theindexmapdependency required byserde_json/preserve_order.Testing / parity
cargo test json_input_preserves_object_key_order_in_raw_bytescargo build --releasecargo testCheckout Blocks parity verification with the rebuilt local runner:
function-rerunner.discount-order: 250,003 / 250,003 semantic matchesdiscount-shipping: 250,002 / 250,002 semantic matchesRisks
--batch-continue-on-errorreturns per-line error JSON for failures; consumers should handle success and error rows explicitly.serde_json/preserve_orderadds anindexmapdependency.