Skip to content

add MI_APPLE_TLS_THREAD_LOCAL option to use THREAD_LOCAL on macOS#1302

Open
shulaoda wants to merge 1 commit into
microsoft:dev3from
shulaoda:fix/apple-tls-thread-local
Open

add MI_APPLE_TLS_THREAD_LOCAL option to use THREAD_LOCAL on macOS#1302
shulaoda wants to merge 1 commit into
microsoft:dev3from
shulaoda:fix/apple-tls-thread-local

Conversation

@shulaoda
Copy link
Copy Markdown

@shulaoda shulaoda commented May 27, 2026

Closes #1301

Summary

Add an opt-in MI_APPLE_TLS_THREAD_LOCAL flag that allows Apple targets to use MI_TLS_MODEL_THREAD_LOCAL + MI_TLS_RECURSE_GUARD instead of the default MI_TLS_MODEL_FIXED_SLOT. Default is 0 (existing behavior unchanged).

Problem

When multiple shared libraries in one process each statically link mimalloc (e.g. Node.js napi addons loaded via dlopen), MI_TLS_MODEL_FIXED_SLOT causes the second library to crash immediately on load (SIGABRT). TCB[108]/[109] is a hardcoded per-thread slot shared across all loaded images — the second mimalloc instance reads the first's heap pointer, concludes "already initialized" (mi_theap_is_initialized only checks non-NULL, no ownership verification), skips its own init, and runs with inconsistent state.

This is a v2 → v3 regression: v2's MI_TLS_SLOT was gated on MI_MALLOC_OVERRIDE (prim.h:350), so non-override builds used __thread (THREAD_LOCAL) with natural per-image isolation. v3 made FIXED_SLOT unconditional.

Currently there is no way to select THREAD_LOCAL on Apple via cflags — the FIXED_SLOT branch in prim.h is an unconditional #define with no #ifndef guard, and -DMI_HAS_TLS_SLOT=0 routes to DYNAMIC_PTHREADS (which has a separate crash on process exit with long-lived background threads).

Change

A single new #elif inserted between the Windows and FIXED_SLOT branches. The original cascade is completely untouched:

+#ifndef MI_APPLE_TLS_THREAD_LOCAL
+#define MI_APPLE_TLS_THREAD_LOCAL  0
+#endif
+
 #if defined(_WIN32)
   #define MI_TLS_MODEL_DYNAMIC_WIN32        1
+#elif defined(__APPLE__) && MI_APPLE_TLS_THREAD_LOCAL && !defined(__POWERPC__)
+  #define MI_TLS_MODEL_THREAD_LOCAL         1
+  #ifndef MI_TLS_RECURSE_GUARD
+  #define MI_TLS_RECURSE_GUARD              1
+  #endif
 #elif defined(__APPLE__) && MI_HAS_TLS_SLOT && !defined(__POWERPC__)
   // ... FIXED_SLOT (unchanged) ...
  • MI_APPLE_TLS_THREAD_LOCAL=0 (default): new branch doesn't match → original FIXED_SLOT path, zero behavior change
  • MI_APPLE_TLS_THREAD_LOCAL=1: new branch matches first → THREAD_LOCAL + RECURSE_GUARD, FIXED_SLOT skipped

Usage:

target_compile_definitions(mimalloc PRIVATE MI_APPLE_TLS_THREAD_LOCAL=1)

Why THREAD_LOCAL solves the multi-instance problem

__thread + mi_decl_hidden (already how __mi_theap_default / __mi_theap_cached are declared in init.c) gives per-image per-thread storage via dyld's TLV system. Different images have different variables — even on the same thread, each image reads/writes its own address. TCB[108] is not used at all.

MI_TLS_RECURSE_GUARD prevents recursion on first __thread access: macOS dyld allocates TLV storage via calloc on first touch, which could recurse into mimalloc before process init completes. The guard short-circuits the fast path with a plain _mi_process_is_initialized bool until init is done.

MI_HAS_TLS_SLOT is not affected — thread-id still uses mi_prim_tls_slot(0) (Apple's system TSD slot, no conflict).

Reproduction

Standalone reproduction project: https://github.com/shulaoda/mimalloc-v3-repro

npm run switch:fixed-slot     # FIXED_SLOT:    Mode A 💥, Mode B ✅
npm run switch:thread-local   # THREAD_LOCAL:  Mode A ✅, Mode B ✅

Downstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant