Fixing an insidious bug in the new Unix directed graph shell
dgsh
allowed me to demonstrate in practice 10 of the 66
principles, techniques, and tools
I describe in the book Effective Debugging.
Almost all steps all documented in the corresponding
issue and
commits.
Here’s a detailed retrospective.
In the following description I list the titles of the corresponding
book sections in bold.
- Four participants handled the bug through a GitHub issue.
(Handle All Problems through an Issue-Tracking System)
- We tried to reproduce the problem in diverse systems under various settings.
(Diversify Your Build and Execution Environment)
This allowed us to find that the problem occurred after sourcing
some shell initialization files.
- Then, in a series of successive iterations, I cut down the
65 line shell script
that triggered the problem and the 8 line interactive script that
demonstrated it, into a single two line script.
It was thus easy to run the script with a single command.
(Enable the Efficient Reproduction of the Problem)
Removing the second statement of the first line (changing
true || false
into true
) allowed me to obtain very compact traces of the
correctly running and failing execution.
(Minimize the Differences between a Working Example and the Failing Code)
- I then used the Unix strace command to record the system calls of
the working and the failing script.
(Trace the Code’s Execution)
- With the two traces at hand I used the Unix grep command to find and
display the execve calls of the working and the failing invocation.
(Analyze Debug Data with Unix Command-Line Tools)
This showed me that the problem was associated with a wrong
executable program search path being used.
(Find the Difference between a Known Good System and a Failing One)
However, I still did not know why the wrong path was used.
- When I run dgsh with debug output enabled, it recorded in the output
of the failing system a key line:
A variable that was supposed to be temporarily set to false,
was never set back to its previous value.
(Use the Software’s Debugging Facilities)
- At that point I was ready to fix the bug.
However, before fixing it, I constructed and added a test case
that exercised the bug and run the program’s tests to ensure the
test failed.
(Find the Fault by Constructing a Test Case)
I then corrected the code,
and run all tests again to verify that the bug was fixed and that
I had not introduced another fault in the system.
- After committing the changes and the test, I removed the various log files
and test scripts in order to get a clean output from
git status
.
(Houseclean Before and After Debugging)
Comments
Post
Toot!
Share