Saturday, June 04, 2016

Why do I get a Linux SIGSEGV / segfault / Segmentation fault ?

To respond to this question you will need to get a coredump and use the GNU debugger (gdb).
The application generating the segmentation fault must be configured to produce a coredump or should be manually run with gdb in case a segmentation fault can be manually replicated.
Let us pick Apache to go through the debugging steps with a real world example. In order to produce a coredump apache2 needs to be configured:
$ vi /etc/apache2/apache2.conf
...
CoreDumpDirectory /tmp/apache-coredumps
...
Also the OS must not impose limits on the size of the core file in case we don't know how big it would be:
$ sudo bash
# ulimit -c unlimited
In addition the core dump directory must exist and must be owned by the apache user (www-data in this case)
sudo mkdir /tmp/apache-coredumps
sudo chown www-data:www-data /tmp/apache-coredumps
Make sure to stop and start apache separately
$ sudo apachectl stop
$ sudo apachectl start
Look into running processes:
$ ps -ef|grep apache2
Confirm that the processes are running with "Max core file size" unlimited:
$ cat /proc/${pid}/limits
...
Max core file size        unlimited            unlimited            bytes
...
Here is a quick way to do it with a one liner. It lists the max core file size for the parent apache2 process and all its children:
$ ps -ef | grep 'sbin/apache2' | grep -v grep | awk '{print $2}' | while read -r pid; do cat /proc/$pid/limits | grep core ; done
To test that your configuration works just force a coredump. This can be achieved by sending a SIGABRT signal to the process:
$ kill -SIGABRT ${pid}
Analyze the coredump file with gdb. In this case it confirms that the SIGABRT signal was used:
$ gdb core  /tmp/apache-coredumps/core
...
Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f1bc74073bd in ?? ()
...
Leave apache running and when it fails with a segmentation fault you can confirm the reason:
$ gdb core  /tmp/apache-coredumps/core
...
Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc4d7ef6a11 in ?? ()
...
Explore then deeper to find out what the problem really is. Note that we run apache now through gdb to get deeper information about the coredump:
$ gdb /usr/sbin/apache2 /tmp/apache-coredumps/core
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc4d7ef6a11 in apr_brigade_cleanup (data=0x7fc4d8760e68) at /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c:44
44 /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c: No such file or directory.
Using the bt command from the gdb prompt gives us more:
Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fc4d7ef6a11 in apr_brigade_cleanup (data=0x7fc4d8760e68) at /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c:44
44 /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c: No such file or directory.
(gdb) bt
#0  0x00007fc4d7ef6a11 in apr_brigade_cleanup (data=0x7fc4d8760e68) at /build/buildd/apr-util-1.5.3/buckets/apr_brigade.c:44
#1  0x00007fc4d7cd79ce in run_cleanups (cref=) at /build/buildd/apr-1.5.0/memory/unix/apr_pools.c:2352
#2  apr_pool_destroy (pool=0x7fc4d875f028) at /build/buildd/apr-1.5.0/memory/unix/apr_pools.c:814
#3  0x00007fc4d357de00 in ssl_io_filter_output (f=0x7fc4d876a8e0, bb=0x7fc4d8767ab8) at ssl_engine_io.c:1659
#4  0x00007fc4d357abaa in ssl_io_filter_coalesce (f=0x7fc4d876a8b8, bb=0x7fc4d8767ab8) at ssl_engine_io.c:1558
#5  0x00007fc4d87f1d2d in ap_process_request_after_handler (r=r@entry=0x7fc4d875f0a0) at http_request.c:256
#6  0x00007fc4d87f262a in ap_process_async_request (r=r@entry=0x7fc4d875f0a0) at http_request.c:353
#7  0x00007fc4d87ef500 in ap_process_http_async_connection (c=0x7fc4d876a330) at http_core.c:143
#8  ap_process_http_connection (c=0x7fc4d876a330) at http_core.c:228
#9  0x00007fc4d87e6220 in ap_run_process_connection (c=0x7fc4d876a330) at connection.c:41
#10 0x00007fc4d47fe81b in process_socket (my_thread_num=19, my_child_num=, cs=0x7fc4d876a2b8, sock=, p=, thd=)
    at event.c:970
#11 worker_thread (thd=, dummy=) at event.c:1815
#12 0x00007fc4d7aa7184 in start_thread (arg=0x7fc4c3fef700) at pthread_create.c:312
#13 0x00007fc4d77d437d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
I was miss led by the messages above until I started changing apache configurations and re-looking into generated coredumps. I realized Apache would fail at any line of any source code complaining about "No such file or directory" in a clear consequence of its incapacity to access resources that did exist:
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/apache2 -k start'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fb44328d992 in ap_save_brigade (f=f@entry=0x7fb4432e5148, saveto=saveto@entry=0x7fb4432e59c8, b=b@entry=0x7fb435ffa568, 
    p=0x7fb443229028) at util_filter.c:648
648 util_filter.c: No such file or directory.
Disabling modules I was able to get to the bottom of it. An apparent bug in mod-proxy when a miss configuration causes recursive pulling of unavailable resources from the proxied target. Once the issue is found comment or eliminate the configuration from apache:
$ sudo vi /etc/apache2/apache2.conf
...
# CoreDumpDirectory /tmp/apache-coredumps
...
$ sudo apachectl graceful

No comments:

Followers