Skip to content

Fix IMU process memory leak + CPU spin (and guard contrast-reserve log spam)#472

Merged
brickbots merged 1 commit into
mainfrom
fix/imu-pickle-leak
Jun 16, 2026
Merged

Fix IMU process memory leak + CPU spin (and guard contrast-reserve log spam)#472
brickbots merged 1 commit into
mainfrom
fix/imu-pickle-leak

Conversation

@brickbots

Copy link
Copy Markdown
Owner

Summary

Two fixes found by the real-hardware endurance test that followed #470. #470 fixed the comet CPU hog; this PR fixes a second, distinct failure mode the endurance test then surfaced — and that mode (a memory leak) is the more likely driver of the actual OOM/freeze. See Relationship to #470 below.

1. IMU process memory leak + CPU spin (primary)

The imu_monitor loop (imu_pi.py) had no sleep. imu.update() throttles the I²C reads to 30 Hz, but the loop body still ran thousands/sec, calling shared_state.set_imu() every iteration. set_imu crosses a multiprocessing Manager proxy, so each call pickles the ImuSample — and pickling a numpy.quaternion leaks (~25 MB/200k dumps, numpy-quaternion 2023.0.4). On real hardware the IMU child leaked ~16 MB/min and spun ~19 % CPU; on a 2 GB Pi that exhausts RAM → swap-thrash → OOM/freeze. Invisible to the fake-IMU headless harness (its update() self-throttles with sleep(0.1)), which is why earlier headless profiling never caught it.

  • Throttle the loop to the sample rate — sleep only the remainder of the period (period − work), so publishing tracks 30 Hz instead of drifting to period + work. The > 0 guard keeps the fake-IMU fallback from double-sleeping. 19 % → ~2.4 % CPU.
  • Stop pickling the quaternion__getstate__/__setstate__ on ImuSample.quat, PointingEstimate.imu_anchor, and SuccessfulSolve.imu_anchor (the same quaternion also rides set_solution's deepcopy + solver_queue). Pickles 4 plain floats, rebuilt on unpickle; the in-process attribute stays a real numpy.quaternion, so consumers are unchanged.

2. Contrast-reserve log spam (rider)

pydeepskylog.contrast_reserve() logs logger.error(...) and returns (doesn't raise) when an object diameter is None, so the surrounding except can't suppress it. object_details calls it per redraw with diameter=None for sizeless objects → steady ERROR-level spam that bypasses the root=ERROR filter. Guard: skip the call when a diameter is None (same blank-contrast result, minus the error).

Verification

  • A/B pickle.dumps RSS: bare quaternion +24.8 MB/200k; all three patched dataclasses +0.0 MB.
  • Unit/smoke: 440 pass (incl. new ImuSample / None-anchor round-trip tests in TestPicklability); ruff + mypy clean; test_ui_modules 211 pass.
  • 50-min real-hardware endurance (real BNO055, Test Mode, IMU child watched directly): anonymous heap flat 113.8 → 113.8 MB (slope −0.000 MB/min), CPU mean 2.35 %, RSS +6.4 MB one-time COW step then plateau. Prior baseline: 134 → 545 MB / 19 % in 28 min.

Relationship to #470

#470 ("Vectorize comet propagation") fixed the CPU hogcalc_comets pegging a core whenever locked. That is a real pathology (UI starvation, heat), but a CPU hog alone doesn't typically hard-crash the OS. This PR fixes a memory leak that does: ~16 MB/min on a 2 GB Pi exhausts RAM in well under an hour → swap-death / OOM, which presents as the field "hang." The endurance test that found this leak ran after the comet fix, with the hog already gone — so the two are independent.

Best current understanding: the field freeze was primarily this memory leak, with the comet CPU hog a compounding factor. Both fixes are needed for a healthy long observing session. #470's description has been updated to match.

🤖 Generated with Claude Code

The imu_monitor loop (imu_pi.py) had no sleep: imu.update() throttles the
I2C reads to 30 Hz, but the loop body still ran thousands/sec, calling
shared_state.set_imu() every iteration. set_imu crosses a multiprocessing
Manager proxy, so each call pickles the ImuSample -- and pickling a
numpy.quaternion leaks (~25 MB/200k dumps, numpy-quaternion 2023.0.4). On
real hardware this was ~16 MB/min + ~19% CPU in the IMU child; on a 2 GB Pi
it drove swap-thrash toward OOM. Invisible to the fake-IMU headless harness
(its update() self-throttles with sleep(0.1)).

Two parts:

1. Throttle the loop to the IMU sample rate: sleep only the remainder of the
   sample period (period - work already done this iteration), so the publish
   cadence tracks the 30 Hz sample rate rather than drifting to period + work.
   The >0 guard keeps the fake-IMU fallback (whose update() already sleeps)
   from double-sleeping. 19% -> ~2.4% CPU.

2. Stop pickling the numpy.quaternion. Add _quat_to_floats/_floats_to_quat
   helpers and __getstate__/__setstate__ to ImuSample (quat) and to
   PointingEstimate / SuccessfulSolve (imu_anchor -- the same quaternion also
   pickles via set_solution's deepcopy and solver_queue). The quaternion
   pickles as 4 plain floats and is rebuilt on unpickle; the in-process
   attribute stays a real numpy.quaternion, so consumers are unchanged.

Verified: A/B pickle.dumps RSS -- bare quaternion +24.8 MB/200k, all three
patched dataclasses +0.0 MB. 50-min real-hardware endurance (real BNO055,
Test Mode): IMU-child anonymous heap flat (113.8 MB, slope -0.000 MB/min),
CPU mean 2.35%, vs the prior 134->545 MB / 19% in 28 min.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@brickbots brickbots force-pushed the fix/imu-pickle-leak branch from 9a5e2c7 to f1154b6 Compare June 15, 2026 20:39
@brickbots brickbots merged commit 555d42e into main Jun 16, 2026
1 check passed
mrosseel added a commit to mrosseel/PiFinder that referenced this pull request Jun 16, 2026
mrosseel added a commit to mrosseel/PiFinder that referenced this pull request Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant