Running pytest when -DUSE_MPI=true results in seg fault.
Example stack trace:
Current thread's C stack trace (most recent call first):
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at _Py_DumpStack+0x48 [0x1009c3a54]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at faulthandler_dump_c_stack+0x58 [0x1009d8914]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at faulthandler_fatal_error+0x17c [0x1009d8854]
Binary file "/usr/lib/system/libsystem_platform.dylib", at _sigtramp+0x38 [0x18555b744]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/lib/libopen-pal.80.dylib", at opal_argv_join+0x40 [0x1028d6830]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/lib/libmpi.40.dylib", at ompi_rte_init+0xb10 [0x103aeefd0]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/lib/libmpi.40.dylib", at ompi_mpi_instance_init+0x1a8 [0x103af2094]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/lib/libmpi.40.dylib", at ompi_mpi_init+0xd0 [0x103aebb58]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/lib/libmpi.40.dylib", at MPI_Init+0x84 [0x103b59f44]
Binary file "/Users//opt/pyorbit3/build/cp314/src/libcore.dylib", at _Z14ORBIT_MPI_Initv+0xfc [0x102ea8070]
Binary file "/Users//opt/pyorbit3/build/cp314/src/libcore.dylib", at initorbit_mpi+0x18 [0x102ea41dc]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at _PyImport_RunModInitFunc+0x68 [0x1009740e4]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at import_run_extension+0x6c [0x100971180]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at _imp_create_dynamic+0x338 [0x100972fb0]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at _TAIL_CALL_CALL_FUNCTION_EX+0x100 [0x1008ffe34]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at _PyFunction_Vectorcall+0x2f0 [0x100780520]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at _PyObject_VectorcallTstate.832+0x50 [0x10077dd54]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at object_vacall+0xe0 [0x100782c78]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at PyObject_CallMethodObjArgs+0x84 [0x100782ad0]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at import_find_and_load+0x22c [0x1009709d8]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at PyImport_ImportModuleLevelObject+0xc64 [0x1009704b4]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at builtin___import__+0x104 [0x1008f5db8]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at _TAIL_CALL_CALL_FUNCTION_EX+0x100 [0x1008ffe34]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at _PyFunction_Vectorcall+0x2f0 [0x100780520]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at _PyObject_VectorcallTstate.832+0x50 [0x10077dd54]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at object_vacall+0xe0 [0x100782c78]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at PyObject_CallMethodObjArgs+0x84 [0x100782ad0]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at PyImport_ImportModuleLevelObject+0x4b4 [0x10096fd04]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at _TAIL_CALL_IMPORT_NAME+0x3c [0x100909cb0]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at _PyEval_Vector+0x2c8 [0x1008fd584]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at PyEval_EvalCode+0xb0 [0x1008fd1fc]
Binary file "/opt/homebrew/Caskroom/miniforge/base/envs/pyorbit/bin/python3.14", at builtin_exec+0x16c [0x1008f7a2c]
<truncated rest of calls>
I recompiled with the -fsanitize=address,undefined and isolated the problem to MPI initialization (ORBIT_MPI_Init):
|
PyObject* sys_module = PyImport_ImportModule("sys"); |
|
PyObject* argv_list = PyObject_GetAttrString(sys_module, "argv"); |
|
|
|
// Check if argv_list is a list |
|
if (PyList_Check(argv_list)) { |
|
// Access individual command-line arguments |
|
len = PyList_Size(argv_list); |
|
ch = (char**) malloc(sizeof(char*) * len); |
|
for (Py_ssize_t i = 0; i < len; ++i) { |
|
PyObject* item = PyList_GetItem(argv_list, i); |
|
if (item && PyUnicode_Check(item)) { |
|
ch[i] = const_cast<char*>(PyUnicode_AsUTF8(item)); |
|
} |
|
} |
|
} |
|
|
|
// Release references |
|
Py_XDECREF(argv_list); |
|
Py_XDECREF(sys_module); |
|
|
|
res = MPI_Init(&len,&ch); |
MPI_Init is receiving arguments parsed from the CLI, which means you can trigger the crash by appending anything after a script:
cd example/SNS_Linac/pyorbit3_linac_model
python pyorbit3_sns_linac_mebt_hebt2.py thiswillsegfault
The kind of weird thing is that if you run the script a few times in quick succession, MPI will sometimes successfully initialize...
The source of the segfault is the fact that the char** array is not null-terminated, so you can fix it by just increasing the length of the character array by 1, and sticking a NULL at the end.
len = PyList_Size(argv_list) + 1;
ch = (char**) malloc(sizeof(char*) * len);
...
ch[len] = NULL;
Regardless, according to the OpenMPI docs:
Open MPI’s MPI_Init and MPI_Init_thread both accept the C argc and argv arguments to main, but neither modifies, interprets, nor distributes them:
MPICH doesn't seem to require it either:
As of MPI-2, MPI_Init will accept NULL as input parameters. Doing so will impact the values stored in MPI_INFO_ENV.
Seems like the ability to forward argc and argv through MPI_Init is somewhat vestigial and initialization is handled by some other mechanism such as environment variables likely set by mpirun.
Therefore, the better solution is probably to refactor ORBIT_MPI_Init to just call to MPI_Init with NULL arguments and not read in from sys.argv at all.
Running pytest when
-DUSE_MPI=trueresults in seg fault.Example stack trace:
I recompiled with the
-fsanitize=address,undefinedand isolated the problem to MPI initialization (ORBIT_MPI_Init):PyORBIT3/src/mpi/orbit_mpi.cc
Lines 16 to 36 in ab6bc4b
MPI_Initis receiving arguments parsed from the CLI, which means you can trigger the crash by appending anything after a script:cd example/SNS_Linac/pyorbit3_linac_model python pyorbit3_sns_linac_mebt_hebt2.py thiswillsegfaultThe kind of weird thing is that if you run the script a few times in quick succession, MPI will sometimes successfully initialize...
The source of the segfault is the fact that the char** array is not null-terminated, so you can fix it by just increasing the length of the character array by 1, and sticking a
NULLat the end.Regardless, according to the OpenMPI docs:
MPICH doesn't seem to require it either:
Seems like the ability to forward
argcandargvthroughMPI_Initis somewhat vestigial and initialization is handled by some other mechanism such as environment variables likely set bympirun.Therefore, the better solution is probably to refactor
ORBIT_MPI_Initto just call to MPI_Init withNULLarguments and not read in from sys.argv at all.