Experiment with streaming cloning by josephjclark · Pull Request #1461 · OpenFn/kit

josephjclark · 2026-06-23T11:11:59Z

Over in #1458 we're focusing on making the "is this state object too big" calculation robust.

That approach uses :

json-stream-stringify to serialize state to a string, throwing if the object exceeds a certain limit
JSON.parse to convert the serialized string back to a JS object.

It is not memory efficient (which can be a problem for memory hungry workflows). The JSON.parse call is blocking and I think can cause the node process to consume too much memory, triggering kubernetes OOMkills.

This PR tries a fully streaming approach:

use json-strream-stringify to stream a state object into a JSON string
pass that into a pipeline which throws if the JSON string gets too long
stream valid chunks directly into a new object

This should significantly reduce the memory overhead of this serialisation,

However! It significantly increases the time: it take 3.8 second to do what #1458 does in 0.9 seconds. I should note that these are pretty large state objects. I also reckon my CPU fan runs louder, implying more CPU usage. This makes semse to me because we're running the seriaisation in JS land, not C land.

AI Usage

Please disclose whether you've used AI anywhere in this PR (it's cool, we just
want to know!):

I have used Claude Code
I have used another model
I have not used AI

You can read more details in our
Responsible AI Policy

Down from 20% of allow memory to 15%

josephjclark · 2026-06-23T11:19:17Z

I am tempted to say that cgroups are a better solution than this

I don't care why memory limits blow right now. If anything in the workflow causes memory to get too high, the node process must terminate gracefully and immediately. Even if this PR fixes this one corner case (at the cost of performance and compute), there are others that might trigger memory errors.

So I think I'll close this but keep the branch, then go and look at cgroups

doc-han and others added 11 commits June 18, 2026 16:16

feat: ensure max state memory

d8dc157

tests: add tests for state size limit

3e37c37

tests: fix flaky test on dataclip loading

599166f

chore: remove log

c2a1b41

chore: update changeset

8ebc086

versions

ef58439

reduce size of state objects

077d5b2

Down from 20% of allow memory to 15%

log state sizes

325fb19

use better streamer

49ba1d9

time calculation

7ee9966

package lock

4ea1a72

taylordowns2000 added this to Core Jun 23, 2026

github-project-automation Bot moved this to New Issues in Core Jun 23, 2026

josephjclark closed this Jun 23, 2026

github-project-automation Bot moved this from New Issues to Done in Core Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experiment with streaming cloning#1461

Experiment with streaming cloning#1461
josephjclark wants to merge 11 commits into
mainfrom
better-safer-serialize

josephjclark commented Jun 23, 2026

Uh oh!

josephjclark commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

josephjclark commented Jun 23, 2026

AI Usage

Uh oh!

josephjclark commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants