Skip to content

Mimalloc 3.1.5 causes Lean4 builds to hang on x86 #1206

@seanmcl

Description

@seanmcl

I apologize in advance for the quality of this bug report. I was hesitant to even submit it due to the lack of actionable data, but someone on the Lean team encouraged me. I'll start with the situation my team recently faced.

  1. We are building a sizable (160k loc) project in Lean. The build takes ~10 minutes on both x86 and aarch64.
  2. Our CI system is sandboxed, so we can't pull dependencies at CI time; we need to copy them locally first, so have a custom build of Lean and custom way to run lean to pick up our local dependencies.
  3. One Lean dependency is mimalloc. We were using 2.1.1. Was working fine.
  4. We updated Lean to 4.26.0 from some earlier version. In the process, we updated mimalloc to 3.1.5, not thinking anything of it at the time. Notably, Lean itself does not do this. It is pegged to 2.2.3
  5. Every 10th or so build on x86 started hanging. It would run a bit, say a few minutes, then make no progress for the 3 hour time limit.
  6. I blamed Lean at first, and got help from the Lean team to periodically dump Lean stack traces using GDB during the CI build. I include a sample dump below.
  7. The Lean team looked at the dump (the full dump had hours of logging) and suggested it may be mimalloc causing the problem on x86. They found some code that branched on the architecture, which notably didn't trigger on aarch64.
  8. I moved mimalloc back to 2.2.3, the one Lean tests with, and all the hangs went away. Hasn't happened again since (probably 50 or more builds since then.)
  9. This never triggered on either a mac or linux local build of the project. It only happened in CI.

That's about all the info I have about the issue. Again, I'm sorry for the lack of a deep dive. I thought you may like to know.

Best,

Sean

08:18:54  [START LEAN THREAD DUMP]
08:18:54   [New LWP 21362]
08:18:54   [New LWP 21394]
08:18:54   [New LWP 21542]
08:18:54   [New LWP 21543]
08:18:54   [New LWP 21544]
08:18:54   [New LWP 21545]
08:18:54   [New LWP 21546]
08:18:54   [New LWP 21547]
08:18:54   [New LWP 21548]
08:18:54   [New LWP 21549]
08:18:54   [New LWP 21550]
08:18:54   [New LWP 21551]
08:18:54   [Thread debugging using libthread_db enabled]
08:18:54   Using host libthread_db library "/lib64/libthread_db.so.1".
08:18:57   0x00007fdcca533377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
08:18:57   warning: target file /proc/21348/cmdline contained unexpected null characters
08:18:57   process 21348
08:18:57   cmdline = 'lean'
08:18:57   exe = 'lean'
08:18:57
08:18:57   Thread 13 (Thread 0x7fdc6c7f8700 (LWP 21551)):
08:18:57   #0  0x00007fdcca533377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
08:18:57   #1  0x00007fdcc9e4cb0c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
08:18:57   #2  0x00007fdcc88b63b8 in std::_Function_handler<void (), lean::task_manager::spawn_worker()::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from libleanshared.so
08:18:57   #3  0x00007fdcc88a8d18 in lean::lthread::imp::_main(void*) () from libleanshared.so
08:18:57   #4  0x00007fdcca52d44b in start_thread () from /lib64/libpthread.so.0
08:18:57   #5  0x00007fdcbe8a052f in clone () from /lib64/libc.so.6
08:18:57
08:18:57   Thread 12 (Thread 0x7fdc6cff9700 (LWP 21550)):
08:18:57   #0  0x00007fdcca533377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
08:18:57   #1  0x00007fdcc9e4cb0c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
08:18:57   #2  0x00007fdcc88b63b8 in std::_Function_handler<void (), lean::task_manager::spawn_worker()::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from libleanshared.so
08:18:57   #3  0x00007fdcc88a8d18 in lean::lthread::imp::_main(void*) () from libleanshared.so
08:18:57   #4  0x00007fdcca52d44b in start_thread () from /lib64/libpthread.so.0
08:18:57   #5  0x00007fdcbe8a052f in clone () from /lib64/libc.so.6
08:18:57
08:18:57   Thread 11 (Thread 0x7fdc6d7fa700 (LWP 21549)):
08:18:57   #0  0x00007fdcca533377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
08:18:57   #1  0x00007fdcc9e4cb0c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
08:18:57   #2  0x00007fdcc88b63b8 in std::_Function_handler<void (), lean::task_manager::spawn_worker()::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from libleanshared.so
08:18:57   #3  0x00007fdcc88a8d18 in lean::lthread::imp::_main(void*) () from libleanshared.so
08:18:57   #4  0x00007fdcca52d44b in start_thread () from /lib64/libpthread.so.0
08:18:57   #5  0x00007fdcbe8a052f in clone () from /lib64/libc.so.6
08:18:57
08:18:57   Thread 10 (Thread 0x7fdc6dffb700 (LWP 21548)):
08:18:57   #0  mi_bchunk_try_find_and_clear (pidx=<synthetic pointer>, chunk=0x7fdc7e350540) at bitmap.c:638
08:18:57   #1  mi_bchunk_try_find_and_clear_1 (pidx=<synthetic pointer>, n=1, chunk=0x7fdc7e350540) at bitmap.c:677
08:18:57   #2  mi_bbitmap_try_find_and_clear_generic (on_find=<optimized out>, pidx=0x7fdc6dffa738, n=1, tseq=<optimized out>, bbitmap=0x7fdc7e350300) at bitmap.c:1682
08:18:57   #3  mi_bbitmap_try_find_and_clear (bbitmap=0x7fdc7e350300, tseq=<optimized out>, pidx=pidx@entry=0x7fdc6dffa738) at bitmap.c:1711
08:18:57   #4  0x00007fdccabd1fe8 in mi_bbitmap_try_find_and_clearN (pidx=0x7fdc6dffa738, tseq=0, n=1, bbitmap=<optimized out>) at bitmap.h:321
08:18:57   #5  mi_arena_try_alloc_at (arena=0x7fdc7e350000, slice_count=slice_count@entry=1, commit=commit@entry=true, tseq=tseq@entry=3, memid=memid@entry=0x7fdc6dffa900) at arena.c:174
08:18:57   #6  0x00007fdccabd2eaf in mi_arenas_try_find_free (subproc=subproc@entry=0x7fdccadeec40 <subproc_main>, slice_count=slice_count@entry=1, alignment=alignment@entry=65536, commit=commit@entry=true, allow_large=allow_large@entry=true, req_arena=req_arena@entry=0x0, tseq=3, numa_node=0, memid=0x7fdc6dffa900) at arena.c:398
08:18:57   #7  0x00007fdccabd38e8 in mi_arenas_try_alloc (subproc=0x7fdccadeec40 <subproc_main>, slice_count=slice_count@entry=1, alignment=alignment@entry=65536, commit=commit@entry=true, allow_large=allow_large@entry=true, req_arena=req_arena@entry=0x0, tseq=3, numa_node=0, memid=0x7fdc6dffa900) at arena.c:426
08:18:57   #8  0x00007fdccabd3fd3 in mi_arenas_page_alloc_fresh (slice_count=slice_count@entry=1, block_size=block_size@entry=16, block_alignment=block_alignment@entry=1, req_arena=req_arena@entry=0x0, numa_node=0, commit=<optimized out>, tld=0x7fdc7f1d6200) at arena.c:603
08:18:57   #9  0x00007fdccabd4199 in mi_arenas_page_regular_alloc (heap=<optimized out>, slice_count=1, block_size=16) at arena.c:743
08:18:57   #10 0x00007fdccabdcd35 in mi_page_fresh_alloc (heap=heap@entry=0x7fdc7f1d8500, pq=pq@entry=0x7fdc7f1d8a30, block_size=<optimized out>, page_alignment=page_alignment@entry=0) at page.c:305
08:18:57   #11 0x00007fdccabdd201 in mi_page_fresh (pq=0x7fdc7f1d8a30, heap=0x7fdc7f1d8500) at page.c:333
08:18:57   #12 mi_page_queue_find_free_ex (heap=0x7fdc7f1d8500, pq=0x7fdc7f1d8a30, first_try=<optimized out>) at page.c:810
08:18:57   #13 0x00007fdccabdd471 in _mi_malloc_generic (heap=0x7fdc7f1d8500, size=16, zero=<optimized out>, huge_alignment=0) at page.c:968
08:18:57   #14 0x00007fdcc127a83c in l_Lean_PersistentArray_mapMAux___at___00Lean_PersistentArray_mapM___at___00Lean_Elab_InfoState_substituteLazy_spec__6_spec__6 () from libleanshared.so
08:18:57   #15 0x00007fdcc127aba0 in l_Lean_PersistentArray_mapM___at___00Lean_Elab_InfoState_substituteLazy_spec__6 () from libleanshared.so
08:18:57   #16 0x00007fdcc127b49b in ::l_Lean_Elab_InfoState_substituteLazy___lam(void) () from libleanshared.so
08:18:57   #17 0x00007fdcc127b503 in l_Lean_Elab_InfoState_substituteLazy___lam__0___boxed () from libleanshared.so
08:18:57   #18 0x00007fdcc88b7ee4 in lean_apply_1 () from libleanshared.so
08:18:57   #19 0x00007fdcc88b7f9d in lean_apply_1 () from libleanshared.so
08:18:57   #20 0x00007fdcc88b64b3 in std::_Function_handler<void (), lean::task_manager::spawn_worker()::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from libleanshared.so
08:18:57   #21 0x00007fdcc88a8d18 in lean::lthread::imp::_main(void*) () from libleanshared.so
08:18:57   #22 0x00007fdcca52d44b in start_thread () from /lib64/libpthread.so.0
08:18:57   #23 0x00007fdcbe8a052f in clone () from /lib64/libc.so.6
08:18:57
08:18:57   Thread 9 (Thread 0x7fdc6e7fc700 (LWP 21547)):
08:18:57   #0  0x00007fdcca533377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
08:18:57   #1  0x00007fdcc9e4cb0c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
08:18:57   #2  0x00007fdcc88b63b8 in std::_Function_handler<void (), lean::task_manager::spawn_worker()::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from libleanshared.so
08:18:57   #3  0x00007fdcc88a8d18 in lean::lthread::imp::_main(void*) () from libleanshared.so
08:18:57   #4  0x00007fdcca52d44b in start_thread () from /lib64/libpthread.so.0
08:18:57   #5  0x00007fdcbe8a052f in clone () from /lib64/libc.so.6
08:18:57
08:18:57   Thread 8 (Thread 0x7fdc6effd700 (LWP 21546)):
08:18:57   #0  0x00007fdcca533377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
08:18:57   #1  0x00007fdcc9e4cb0c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
08:18:57   #2  0x00007fdcc88b63b8 in std::_Function_handler<void (), lean::task_manager::spawn_worker()::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from libleanshared.so
08:18:57   #3  0x00007fdcc88a8d18 in lean::lthread::imp::_main(void*) () from libleanshared.so
08:18:57   #4  0x00007fdcca52d44b in start_thread () from /lib64/libpthread.so.0
08:18:57   #5  0x00007fdcbe8a052f in clone () from /lib64/libc.so.6
08:18:57
08:18:57   Thread 7 (Thread 0x7fdc6f7fe700 (LWP 21545)):
08:18:57   #0  0x00007fdcca533377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
08:18:57   #1  0x00007fdcc9e4cb0c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
08:18:57   #2  0x00007fdcc88b63b8 in std::_Function_handler<void (), lean::task_manager::spawn_worker()::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from libleanshared.so
08:18:57   #3  0x00007fdcc88a8d18 in lean::lthread::imp::_main(void*) () from libleanshared.so
08:18:57   #4  0x00007fdcca52d44b in start_thread () from /lib64/libpthread.so.0
08:18:57   #5  0x00007fdcbe8a052f in clone () from /lib64/libc.so.6
08:18:57
08:18:57   Thread 6 (Thread 0x7fdc6ffff700 (LWP 21544)):
08:18:57   #0  0x00007fdcca533377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
08:18:57   #1  0x00007fdcc9e4cb0c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
08:18:57   #2  0x00007fdcc88b63b8 in std::_Function_handler<void (), lean::task_manager::spawn_worker()::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from libleanshared.so
08:18:57   #3  0x00007fdcc88a8d18 in lean::lthread::imp::_main(void*) () from libleanshared.so
08:18:57   #4  0x00007fdcca52d44b in start_thread () from /lib64/libpthread.so.0
08:18:57   #5  0x00007fdcbe8a052f in clone () from /lib64/libc.so.6
08:18:57
08:18:57   Thread 5 (Thread 0x7fdc7cb4c700 (LWP 21543)):
08:18:57   #0  0x00007fdcca533377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
08:18:57   #1  0x00007fdcc9e4cb0c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
08:18:57   #2  0x00007fdcc88b63b8 in std::_Function_handler<void (), lean::task_manager::spawn_worker()::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from libleanshared.so
08:18:57   #3  0x00007fdcc88a8d18 in lean::lthread::imp::_main(void*) () from libleanshared.so
08:18:57   #4  0x00007fdcca52d44b in start_thread () from /lib64/libpthread.so.0
08:18:57   #5  0x00007fdcbe8a052f in clone () from /lib64/libc.so.6
08:18:57
08:18:57   Thread 4 (Thread 0x7fdc7d34d700 (LWP 21542)):
08:18:57   #0  0x00007fdcca533377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
08:18:57   #1  0x00007fdcc9e4cb0c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
08:18:57   #2  0x00007fdcc88b63b8 in std::_Function_handler<void (), lean::task_manager::spawn_worker()::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from libleanshared.so
08:18:57   #3  0x00007fdcc88a8d18 in lean::lthread::imp::_main(void*) () from libleanshared.so
08:18:57   #4  0x00007fdcca52d44b in start_thread () from /lib64/libpthread.so.0
08:18:57   #5  0x00007fdcbe8a052f in clone () from /lib64/libc.so.6
08:18:57
08:18:57   Thread 3 (Thread 0x7fdc7db4e700 (LWP 21394)):
08:18:57   #0  0x00007fdcca533377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
08:18:57   #1  0x00007fdcc9e4cb0c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
08:18:57   #2  0x00007fdcc88b63b8 in std::_Function_handler<void (), lean::task_manager::spawn_worker()::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from libleanshared.so
08:18:57   #3  0x00007fdcc88a8d18 in lean::lthread::imp::_main(void*) () from libleanshared.so
08:18:57   #4  0x00007fdcca52d44b in start_thread () from /lib64/libpthread.so.0
08:18:57   #5  0x00007fdcbe8a052f in clone () from /lib64/libc.so.6
08:18:57
08:18:57   Thread 2 (Thread 0x7fdc7e34f700 (LWP 21362)):
08:18:57   #0  0x00007fdcbe8a084c in epoll_wait () from /lib64/libc.so.6
08:18:57   #1  0x00007fdccae1319d in uv__io_poll (loop=loop@entry=0x7fdccb01bac0 <default_loop_struct>, timeout=<optimized out>) at epoll.c:236
08:18:57   #2  0x00007fdccae0380c in uv_run (loop=0x7fdccb01bac0 <default_loop_struct>, mode=UV_RUN_ONCE) at core.c:406
08:18:57   #3  0x00007fdcc88d8d53 in lean::event_loop_run_loop(lean::event_loop_t*) () from libleanshared.so
08:18:57   #4  0x00007fdcc88a8d18 in lean::lthread::imp::_main(void*) () from libleanshared.so
08:18:57   #5  0x00007fdcca52d44b in start_thread () from /lib64/libpthread.so.0
08:18:57   #6  0x00007fdcbe8a052f in clone () from /lib64/libc.so.6
08:18:57
08:18:57   Thread 1 (Thread 0x7fdccb22f800 (LWP 21348)):
08:18:57   #0  0x00007fdcca533377 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
08:18:57   #1  0x00007fdcc9e4cb0c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
08:18:57   #2  0x00007fdcc88b310e in lean_task_get () from libleanshared.so
08:18:57   #3  0x00007fdcc1127909 in l_Lean_Language_SnapshotTask_get___redArg () from libleanshared.so
08:18:57   #4  0x00007fdcc112be5d in ::l___private_Init_Data_Array_Basic_0__Array_foldlMUnsafe_fold___at___00Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec__0_spec(void) () from libleanshared.so
08:18:57   #5  0x00007fdcc112bcab in ::l_Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec(void) () from libleanshared.so
08:18:57   #6  0x00007fdcc112be72 in ::l___private_Init_Data_Array_Basic_0__Array_foldlMUnsafe_fold___at___00Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec__0_spec(void) () from libleanshared.so
08:18:57   #7  0x00007fdcc112bcab in ::l_Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec(void) () from libleanshared.so
08:18:57   #8  0x00007fdcc112be72 in ::l___private_Init_Data_Array_Basic_0__Array_foldlMUnsafe_fold___at___00Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec__0_spec(void) () from libleanshared.so
08:18:57   #9  0x00007fdcc112bcab in ::l_Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec(void) () from libleanshared.so
08:18:57   #10 0x00007fdcc112be72 in ::l___private_Init_Data_Array_Basic_0__Array_foldlMUnsafe_fold___at___00Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec__0_spec(void) () from libleanshared.so
08:18:57   #11 0x00007fdcc112bcab in ::l_Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec(void) () from libleanshared.so
08:18:57   #12 0x00007fdcc112be72 in ::l___private_Init_Data_Array_Basic_0__Array_foldlMUnsafe_fold___at___00Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec__0_spec(void) () from libleanshared.so
08:18:57   #13 0x00007fdcc112bcab in ::l_Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec(void) () from libleanshared.so
08:18:57   #14 0x00007fdcc112be72 in ::l___private_Init_Data_Array_Basic_0__Array_foldlMUnsafe_fold___at___00Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec__0_spec(void) () from libleanshared.so
08:18:57   #15 0x00007fdcc112bcab in ::l_Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec(void) () from libleanshared.so
08:18:57   #16 0x00007fdcc112be72 in ::l___private_Init_Data_Array_Basic_0__Array_foldlMUnsafe_fold___at___00Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec__0_spec(void) () from libleanshared.so
08:18:57   #17 0x00007fdcc112bcab in ::l_Lean_Language_SnapshotTree_foldM___at___00Lean_Language_SnapshotTree_runAndReport_spec(void) () from libleanshared.so
08:18:57   #18 0x00007fdcc112bf35 in l_Lean_Language_SnapshotTree_runAndReport () from libleanshared.so
08:18:57   #19 0x00007fdcc237bfca in l_Lean_Elab_runFrontend () from libleanshared.so
08:18:57   #20 0x00007fdcc0f79a9e in lean_shell_main () from libleanshared.so
08:18:57   #21 0x00007fdcc0857221 in lean::run_shell_main(int, char**, lean::buffer<lean::string_ref, 16ul>, int, bool, bool, bool, bool, bool, bool, lean::options const&, unsigned int, lean::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, lean::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, lean::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, lean::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, lean::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, lean::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, bool, lean::array_ref<lean::name> const&, bool, bool) () from libleanshared.so
08:18:57   #22 0x00007fdcc085a056 in lean_main () from libleanshared.so
08:18:57   #23 0x00007fdcbe7d213a in __libc_start_main () from /lib64/libc.so.6
08:18:57   #24 0x000000000040078a in _start ()
08:18:57  [END LEAN THREAD DUMP]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions