Skip to content

worker: fix premature addon unload#63575

Open
mohd-akram wants to merge 1 commit into
nodejs:mainfrom
mohd-akram:fix-addon-unload
Open

worker: fix premature addon unload#63575
mohd-akram wants to merge 1 commit into
nodejs:mainfrom
mohd-akram:fix-addon-unload

Conversation

@mohd-akram
Copy link
Copy Markdown
Contributor

Weak callbacks could run after addons were unloaded, leading to a crash.

Fixes #63540

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/gyp

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels May 25, 2026
@mohd-akram mohd-akram changed the title fix: premature addon unload in workers worker: fix premature addon unload May 25, 2026
@mohd-akram mohd-akram force-pushed the fix-addon-unload branch 3 times, most recently from 912017b to 6144e5a Compare May 26, 2026 00:16
Weak callbacks could run after addons were unloaded, leading to a crash.

Signed-off-by: Mohamed Akram <mohd.akram@outlook.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.33%. Comparing base (8c495c8) to head (50023d9).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main   #63575    +/-   ##
========================================
  Coverage   90.33%   90.33%            
========================================
  Files         730      730            
  Lines      234362   234464   +102     
  Branches    43908    43909     +1     
========================================
+ Hits       211708   211810   +102     
- Misses      14376    14378     +2     
+ Partials     8278     8276     -2     
Files with missing lines Coverage Δ
src/env-inl.h 95.01% <100.00%> (+0.02%) ⬆️
src/env.cc 85.35% <ø> (+0.18%) ⬆️
src/env.h 98.21% <ø> (ø)
src/node_worker.cc 82.17% <100.00%> (+0.05%) ⬆️

... and 31 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@addaleax addaleax added the worker Issues and PRs related to Worker support. label May 26, 2026
Copy link
Copy Markdown
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a valid bug, but it doesn't seem like an issue in the workers implementation, and rather one in node::ObjectWrap; the class should add an explicit cleanup hook via AddEnvironmentCleanupHook() along its weak callback.

node::ObjectWrap in general has been unmaintained and really should have been deprecated a long time ago, but yeah, that's the place to fix this.

@mohd-akram
Copy link
Copy Markdown
Contributor Author

@addaleax But this problem doesn't happen in the main thread. I had first thought about solving it in ObjectWrap as well, but that will not fix the case where addons use WeakCallback directly, and potentially other situations where a callback lingers. Also, as you mentioned, the ObjectWrap code has not been modified for many years and I would be hesitant to make any change there lest it break some other thing. It is exceptionally difficult to debug this type of bug because the code is gone so it doesn't appear in the stack trace. Per the comment that's there in the code, this was disabled for the main thread presumably because of certain issues, and the same ones will ostensibly happen if an add-on is only loaded in the worker thread, so IMO it would be good to align the main and worker thread behaviors. It also makes sense from a pure conceptual perspective that unloading should happen after all code has stopped running, and I'm not sure if there's some benefit for doing it earlier.

@addaleax
Copy link
Copy Markdown
Member

and potentially other situations where a callback lingers

That's part of the problem – this isn't specific to WeakCallbacks per se, and you can run into it with any addon that manually schedules callbacks.

Per the comment that's there in the code, this was disabled for the main thread presumably because of certain issues, and the same ones will ostensibly happen if an add-on is only loaded in the worker thread, so IMO it would be good to align the main and worker thread behaviors.

It's disabled in the main thread because legacy addons that are not aware of Environment cleanup hooks are expected to keep working as they are, that's all. This is also why we call out in our docs that addons need to explicitly opt into supporting worker threads: https://nodejs.org/api/addons.html#worker-support

Also, as you mentioned, the ObjectWrap code has not been modified for many years and I would be hesitant to make any change there lest it break some other thing.

That's not a good reason to keep it buggy tbh.

It also makes sense from a pure conceptual perspective that unloading should happen after all code has stopped running, and I'm not sure if there's some benefit for doing it earlier.

At the very least, there needs to be something that ensures that embedder-created Environment* instances still have their addons closed, which this PR currently breaks. But really, the Environment instance is conceptually what owns the addons, and if one has objects or callbacks that outlive the Enviroment, that's a bug (again, AddEnvironmentCleanupHook() is there for exactly this).

@mohd-akram
Copy link
Copy Markdown
Contributor Author

mohd-akram commented May 27, 2026

At the very least, there needs to be something that ensures that embedder-created Environment* instances still have their addons closed

There's an argument for not dlclosing at all. It is allowed to be a no-op after all. My understanding is that opening and closing libraries several times can cause issues. I could not find the original motivation for adding the close. On the one hand the dlopen happens inside Node.js, but in reality it's opened for the whole process, so it's not strictly tied to the Environment. Even if one were to add an environment cleanup hook, what if you need to do some asynchronous cleanup?

which this PR currently breaks

In the current behavior, without this PR, would that close code run if you created the embedder instance in the main thread?

@addaleax
Copy link
Copy Markdown
Member

It is allowed to be a no-op after all.

It is a no-op under specific circumstances, with legacy addons that have been written for older versions of Node.js that did not have cleanup logic in place.

My understanding is that opening and closing libraries several times can cause issues.

It can expose bugs in libraries, yes, but it's not inherently problematic.

On the one hand the dlopen happens inside Node.js, but in reality it's opened for the whole process, so it's not strictly tied to the Environment

Yeah, but the concept of ownership doesn't stop applying because it interacts with process-global state.

Even if one were to add an environment cleanup hook, what if you need to do some asynchronous cleanup?

Well ... you use the async environment cleanup hook version, right?

In the current behavior, without this PR, would that close code run if you created the embedder instance in the main thread?

Yeah, I have to correct myself here – currently we don't expose the is_main_thread() flag as something that is exposed to embedders. That's not great, but I guess right now embedders also don't get the benefits of cleanup.

@mohd-akram
Copy link
Copy Markdown
Contributor Author

It is a no-op under specific circumstances, with legacy addons that have been written for older versions of Node.js that did not have cleanup logic in place.

I meant on the OS level, dlclose can be a no-op per POSIX, as it is in musl.

Yeah, but the concept of ownership doesn't stop applying because it interacts with process-global state.

Yes, but shouldn't the process be the owner?

That's not great, but I guess right now embedders also don't get the benefits of cleanup.

In that case, would it make sense to now align the worker thread behavior with the main thread? At least until this is figured out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. worker Issues and PRs related to Worker support.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segmentation fault on worker exit

3 participants