gdb 101: tracing a Mesa segfault in Docker, part 1

5 minute read Published:

When attempting to launch and play Quake 1 in a Docker container by displaying to the host's Wayland plumbing, I ran into a segfault in the Mesa graphical driver. I attempt to hunt it down.

The code is here. Quake is intentionally stylized as Qvake for the purposes of edginess.

ezQuake

The Dockerfile doesn’t contain much in the form of Wayland plumbing - it’s mostly a fetcher and builder of ezQuake.

The only display-related piece is ENV DISPLAY :0. Setting this env var tells the container to use display :0 to display stuff. This will help when using docker run arguments (as I’ll show in the next section) to bind your host’s display to display 0 in the container.

Welcome to the home of ezQuake, a modern QuakeWorld client focused on competitive online play.

The requirements to build this project on your own are to have the files pak0.pak and pak1.pak to put into the pak directory of the cloned repository. These are files full of game contents that I could easily obtain from my Steam copy of Quake.

Wayland apparatus

launch_quake.tcl is the launcher script I’ve been using to iterate the development of the container. I picked TCL for no real reason - bored of Bash, wanted to try something new, etc.

There’s a slew of Wayland stuff that I picked up from about a week of trawling the web for “how to run graphical applications in Docker”:

set docker_exec_args "-e XDG_RUNTIME_DIR=/tmp \
        --privileged \
        -e WAYLAND_DISPLAY=$::env(WAYLAND_DISPLAY) \
        -v $::env(XDG_RUNTIME_DIR)/$::env(WAYLAND_DISPLAY):/tmp/$::env(WAYLAND_DISPLAY) \
        --device /dev/video0 \
        --device /dev/snd \
        --device /dev/shm \
        --device /dev/dri/card0 \
        --device /dev/fb0 \
        --device /dev/dri/renderD128"

Mesa segfault

Launching the container (the entrypoint is ezquake-linux-x86_64) results in an unspecified error. For that reason I crafted the launch_qvake.tcl --debug option which installs valgrind, gdb, and strace in a qvake:debug version of the container and drops you into an interactive bash prompt in the container.

Running gdb shows us that something is fucked in the graphical drivers:

Thread 1 "ezquake-linux-x" received signal SIGSEGV, Segmentation fault.
0x00007fffe4ef2058 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so

The ?? is ugly - it means there’s missing debug information in this lib, and we need to install debug symbols.

We run some apt incantations to find out who this file belongs to (mixed in with a lot of Google searches):

root@704e3f0309d9:/ezquake# apt list --installed | grep libgl1-mesa-dri
libgl1-mesa-dri/bionic,now 17.2.4-0ubuntu2 amd64 [installed,automatic]
root@704e3f0309d9:/ezquake# dpkg-query -L libgl1-mesa-dri | grep i965_dri\.so
/usr/lib/x86_64-linux-gnu/dri/i965_dri.so
root@704e3f0309d9:/ezquake# apt search libgl1-mesa-dri
Sorting... Done
Full Text Search... Done
libgl1-mesa-dri/bionic,now 17.2.4-0ubuntu2 amd64 [installed,automatic]
  free implementation of the OpenGL API -- DRI modules

There are no debug symbol packages available on the Ubuntu Bionic repositories for some reason. One search later, I find libgl1-mesa-dri-dbgsym-17.2.4-0ubuntu2 and download it into extra_deb. I’ll leverage my existing debug build arg and only install the dbgsym package if DEBUG_PACKAGES is set:

RUN if test "${DEBUG_PACKAGES}"; then dpkg -i /blobs/libgl1-mesa-dri-dbgsym_17.2.4-0ubuntu2_amd64.ddeb || true; apt -yf install; fi

The deb I downloaded from the above link worked:

root@24117324bfa1:/ezquake# apt list --installed | grep libgl1-mesa-dri

libgl1-mesa-dri/bionic,now 17.2.4-0ubuntu2 amd64 [installed,automatic]
libgl1-mesa-dri-dbgsym/now 17.2.4-0ubuntu2 amd64 [installed,local]

Retry with debug symbols

The gdb output now contains more information:

Starting program: /ezquake/ezquake-linux-x86_64

Thread 1 "ezquake-linux-x" received signal SIGSEGV, Segmentation fault.
intel_miptree_render_aux_usage (brw=brw@entry=0x7ffff7f9a040, mt=mt@entry=0x0, srgb_enabled=false, blend_enabled=false)
    at ../../../../../../src/mesa/drivers/dri/i965/intel_mipmap_tree.c:2575
2575    ../../../../../../src/mesa/drivers/dri/i965/intel_mipmap_tree.c: No such file or directory.

Now, I’m no gdb wizard, so I pulled up an old guide in the hopes that it would help me. The backtrace command seems promising, so let’s run it:

(gdb) bt
#0  intel_miptree_render_aux_usage (brw=brw@entry=0x7ffff7f9a040,
    mt=mt@entry=0x0, srgb_enabled=false, blend_enabled=false)
    at ../../../../../../src/mesa/drivers/dri/i965/intel_mipmap_tree.c:2575
#1  0x00007fffe4ee386e in brw_update_renderbuffer_surface (
    brw=0x7ffff7f9a040, rb=0x55555ae58ea0, flags=0, unit=<optimized out>,
    surf_index=<optimized out>)
    at ../../../../../../src/mesa/drivers/dri/i965/brw_wm_surface_state.c:223
#2  0x00007fffe4ee57b4 in brw_update_renderbuffer_surfaces (
    brw=0x7ffff7f9a040, fb=0x55555ae1de10, render_target_start=0,
    surf_offset=0x7ffff7fc0854)
    at ../../../../../../src/mesa/drivers/dri/i965/brw_wm_surface_state.c:1060
#3  0x00007fffe4ee5804 in update_renderbuffer_surfaces (brw=0x7ffff7f9a040)
    at ../../../../../../src/mesa/drivers/dri/i965/brw_wm_surface_state.c:1085
#4  0x00007fffe4edc1c8 in check_and_emit_atom (atom=0x7ffff7fc1170,
    state=<synthetic pointer>, brw=0x7ffff7f9a040)
    at ../../../../../../src/mesa/drivers/dri/i965/brw_state_upload.c:433
#5  brw_upload_pipeline_state (pipeline=BRW_RENDER_PIPELINE,
    brw=0x7ffff7f9a040)
    at ../../../../../../src/mesa/drivers/dri/i965/brw_state_upload.c:547
#6  brw_upload_render_state (brw=brw@entry=0x7ffff7f9a040)
    at ../../../../../../src/mesa/drivers/dri/i965/brw_state_upload.c:569
#7  0x00007fffe4ecf170 in brw_try_draw_prims (indirect=0x0, stream=0,
    xfb_obj=0x0, max_index=<optimized out>, min_index=<optimized out>,
    index_bounds_valid=<optimized out>, ib=<optimized out>, nr_prims=1,
    prims=<optimized out>, arrays=0x55555ae7b1c8, ctx=0x7ffff7f9a040)
    at ../../../../../../src/mesa/drivers/dri/i965/brw_draw.c:777
#8  brw_draw_prims (ctx=0x7ffff7f9a040, prims=<optimized out>, nr_prims=1,
    ib=<optimized out>, index_bounds_valid=<optimized out>,
    min_index=<optimized out>, max_index=<optimized out>, gl_xfb_obj=0x0,
    stream=0, indirect=0x0)
    at ../../../../../../src/mesa/drivers/dri/i965/brw_draw.c:869
#9  0x00007fffe4ca8b26 in vbo_exec_vtx_flush (exec=exec@entry=0x55555ae78ba8,
    keepUnmapped=<optimized out>) at ../../../src/mesa/vbo/vbo_exec_draw.c:435
#10 0x00007fffe4c8d01c in vbo_exec_FlushVertices_internal (
    exec=0x55555ae78ba8, unmap=<optimized out>)
    at ../../../src/mesa/vbo/vbo_exec_api.c:637
#11 0x00007fffe4ca5c03 in vbo_exec_FlushVertices (ctx=0x7ffff7f9a040,
    flags=flags@entry=1) at ../../../src/mesa/vbo/vbo_exec_api.c:1294
#12 0x00007fffe4b74212 in draw_buffer (ctx=0x7ffff7f9a040, fb=0x55555ae1de10,
---Type <return> to continue, or q <return> to quit---
    buffer=<optimized out>, caller=0x7fffe512b0d2 "glDrawBuffer")
    at ../../../src/mesa/main/buffers.c:273
#13 0x000055555558b654 in FS_LoadFile (
    path=path@entry=0x7fffffffe080 "gfx/qplaque.lmp",
    usehunk=usehunk@entry=2, file_length=file_length@entry=0x0) at fs.c:388

There’s a lot of output there. I’ll try out gdbgui to see if it’s easier for me to visualize this error.

gdbgui

Again, I leverage my docker build arg to install gdbgui conditionally for the debug container:

RUN if test "${DEBUG_PACKAGES}"; then wget https://gdbgui.com/downloads/linux/gdbgui_0.10.1.0 -o /gdbgui; fi

In launch_qvake.tcl, I background gdbgui on launch and bind my host’s port 5000 to the gdbgui port 5000:

[...]
-p 5000:5000 \
$container_debug_name \
sh -c \"/gdbgui -r /ezquake/ezquake-linux-x86_64 &\
/bin/bash \""

Upon running ./launch_qvake.tcl --debug, I’m greeted with:

root@d43ed21808bb:/ezquake# Warning: authentication is recommended when serving on a publicly accessible IP address. See gdbgui --help.
View gdbgui at http://172.17.0.2:5000
exit gdbgui by pressing CTRL+C

root@d43ed21808bb:/ezquake#

Navigating to localhost:5000 on my host’s browser gives me gdbgui, and clicking the play button gets me to the segfault I found above:

gdbgui

There’s a lot of File not found: *.c messages, which I assume means that I’m missing source packages in my container.

To be continued in a future post.