Thu, 20 Nov 2008
One of Those Codes
--3177-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting --3177-- si_code=80; Faulting address: 0x0; sp: 0x402A9FD40 valgrind: the 'impossible' happened: Killed by fatal signal ==3177== at 0x3801FDEA: unlinkBlock (m_mallocfree.c:190) ==3177== by 0x38020CAE: vgPlain_arena_malloc (m_mallocfree.c:1055) ==3177== by 0x38035516: vgPlain_cli_malloc (replacemalloc_core.c:101) ==3177== by 0x380022F5: vgMemCheck_malloc (mc_malloc_wrappers.c:182) ==3177== by 0x38035BA7: do_client_request (scheduler.c:1158) ==3177== by 0x380372B1: vgPlain_scheduler (scheduler.c:869) ==3177== by 0x38051B59: run_a_thread_NORETURN (syswrap-linux.c:87)
This code has already killed one debugger (Intel DB) and sent strace into an infinite loop of segfaults. <Sigh />.
Posted by Anonymous at Thu Nov 20 19:10:06 2008
We've a Bull test cluster based on Redhat, rather than SLES10 which our other cluster (stokes.ichec.ie) uses. This means glibc 2.5 rather than glibc 2.4, which I think is the issue. That, and using a commercial compiler like Intel.
Problems are seen using the Intel compiler (versions 9, 10 and 11) with
"glibc memory corruption detected". This is on a Fortran construct:
Which is perfectly valid Fortran. Doesn't matter if LINE is automatic or not, dynamically allocated or not. Switching to gfortran makes it work, but there is still a bug report to be filed: either with glibc or Intel, once i've dug deeper and found the culprit.
Ditto, an Intel debugger issue. This was a misbuild of the code somehow, again, further work required.
valgrind: now the code is an MPI code, but it fails before MPI is executed. Trace the cause.
strace: well, the code has a segfault catcher in it, that prints out a traceback. Something in the code and strace go into an infinite loop of segfault catching.
So, a fair bit of bugreports to submit, once i've got the code running operationally. Its needed to predict storms, you see, and some may occur this winter ...
Posted by Alastair McKinstry at Sat Nov 22 12:50:37 2008