Debugging 101
Definition? Debugging involves inspecting a program’s internal state.
printf
Debugging and Logging
In printf
debugging, one adds print statements and keeps iterating until
enough information has been extracted.
Using logging has several advantages over printf
debugging: varying logging
destinations (e.g. standard output, files, sockets, remote servers, etc.);
severity levels (e.g. INFO
, DEBUG
, WARN
, ERROR
, &c
) that support
filtering of output; color-coding for readability.
Logging can also be disabled in various contexts. For example, Chromium compiles
away DLOG
statements to no-op in non-debug builds.
Most programs write logs to some standardized location (sometimes through a
shell-provided program like logger
). In UNIX systems, logs are usually written
in /var/log
. There may be specialized terminal utilities for displaying the
logs, e.g.
journalctl
in
Linux, and
log show
on macOS.
Debuggers
Optimized code is frequently tougher to debug because the symbols are often optimized away.
Making one-off adjustments during debugging has varying levels of support, e.g. Haskell’s debugger allows on-the-fly modification; log-points in Microsoft’s C++ Extension for VS Code pauses execution whenever I try it.
Debuggers are programs that let you interact with the execution of a program, allowing actions like:
- Halt execution of the program when it reaches a certain line.
- Step through the program one instruction at a time.
- Inspect values of variables after the program crashed.
- Conditionally halt the execution when a given condition is met.
Various debuggers exist, e.g. GDB, LLDB, extensions in VS Code, Haskell’s built-in debugger, and so forth.
Many programming languages come with some form of debugger, e.g.
pdb
for Python.
gdb
and
lldb
are optimized for C-like language debugging, but can probe pretty much any
process. There are quality-of-life improvements to the default debuggers, e.g.
ipdb
, and
pwndbg
.
Specialized Tools
Whenever programs need to perform actions that only the kernel can, they use
system calls.
strace
(Linux) and
dtrace
(macOS and BSD) commands let you
trace the system calls that a program makes.
System calls include process control (e.g. create process, terminate process, load, execute, get/set process attributes, wait, allocate/free memory), file management (create/delete file, open/close file, read/write/reposition, get/set file attributes), device management (request/release device, read/write/reposition, get/set device attributes, logically attach/detach devices), information maintenance (get/set total system information; get/set process, file, or device metadata), communication (create/delete communication connection, send/receive messages, transfer status information, attach/detach remote devices), protection (get/set file permissions).
To look at network packets,
tcpdump
and
Wireshark
let you read the contents of network
packets and filter them based on different criteria .
Fiddler
allows you to proxy any HTTP
requests.
Static Analysis
Static analysis programs take source code as input and analyze it using coding
rules to reason about its correctness. For example,
pyflakes
can identify bugs like
shadowing and use of undefined variables, while
bandit
finds common security issues in
Python code, e.g. parsing untrusted XML. These tools can be integrated into
editors and IDEs such that their outputs are displayed inline (code linting),
e.g. Vim has
ale
and
syntastic
. There are crowd-sourced
lists like
static analysis
tools
and
awesome
linters
.
Linters and static analyzers are pretty helpful when picking up a language. It’s like having an ever-present tutor.
Pretty cool that ale
can display warnings and errors in files being edited in
Vim before files have been saved back to a filesystem. This is possible
because Vim has APIs to access the contents of its text buffers.
Code formatters auto-format source code such that it’s consistent with common stylistic patterns. Giving up stylistic control and using standardized code format helps other people read your (stylistically standardized) code.
Code formatters can be pretty opinionated because there can be more than one valid way of achieving something. For example, Python’s Black :
The uncompromising code formatter
“Any color you like.”
By using Black, you agree to cede control over minutiae of hand-formatting. In return, Black gives you speed, determinism, and freedom from pycodestyle nagging about formatting. You will save time and mental energy for more important matters.
Black makes code review faster by producing the smallest diffs possible. Blackened code looks the same regardless of the project you’re reading. Formatting becomes transparent after a while and you can focus on the content instead.
Literature Review
In delta debugging, we want to reduce an object while preserving a certain property. More formally, let \(\mathbb{X}\) be a universe of all objects of interest, \(\phi: \mathbb{X} \to \{ \text{TRUE}, \text{FALSE} \} \) be a test function that determines whether an object \(X \in \mathbb{X}\) exhibits a given property or not, and \(|X|\) be the size of \(X\). The goal is to find an object \(X'\) such that \(|X'|\) is as small as possible, and \(\phi(X') = \text{TRUE}\).
propose a probabilistic delta debugging algorithm (ProbDD) to improve on ’s widely-used ddmin algorithm. The ddmin algorithm follows a predefined sequence of steps to remove items from a sequence, while ProbDD learns a probabilistic model to select which items to remove in the next iteration. The worst-case asymptotic number of tests while using ProDD is \(O(n)\), while that of ddmin is \(O(n^2)\). ProbDD also tends to produce smaller final artifacts than those produced by ddmin.
Delta debugging reminds me of git-bisect , which uses binary search to find the commit that introduced a bug. do not mention git-bisect. Assuming that at each each commit, the code is in a state that provides a useful answer1, git-bisect takes at most \(O(lg N)\) steps. ProbDD looks useful where git history is absent or not useful (e.g. finding the fault within a commit that changed a lot of things).
1 For example, if we’re looking for a feature bug, then the code must be buildable at each commit. Otherwise, we won’t know whether that commit broke the feature.
Statistical Fault Localization (SFL) techniques use execution profiles and success/failure information from execution data to predict which program elements are likely to be faulty.
SFL is one of the techniques for Automated Software Fault Localization (AFL). The goal of AFL is to compute suspiciousness scores for program statements and other elements, and present those to the developers for deeper investigation. SFL is not used widely in industry because most techniques do not consistently locate faults with enough precision.
note that most SFL techniques measure correlation, and are thus susceptible to confounding bias.
Let \(X\) be some independent variable, and \(Y\) some dependent variable. To estimate the effect of \(X\) on \(Y\), the statistician must suppress the effects of extraneous variables that influence both \(X\) and \(Y\). We say that \(X\) and \(Y\) are confounded by some other variable \(Z\) whenever \(Z\) causally influences both \(X\) and \(Y\).
propose UniVal, an SFL technique that uses causal inference techniques and ML to integrate information about both predicate outcomes and variable values. UniVal’s key insight is to transform the program under analysis so that branch and loop predicate outcomes become variable values, so that one causally sound value-based SFL technique can be applied to both variable assignments and predicates.
References
- Probabilistic Delta debugging. Guancheng Wang; Ruobing Shen; Junjie Chen; Yingfei Xiong; Lu Zhang. European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Proceedings of the 29th, Aug 2021, pp. 881-892. Peking University; Tianjin University. . doi.org . scholar.google.com . Cited 1 times as of Jan 26, 2022. github.com
- Simplifying and Isolating Failure-Inducing Input. Andreas Zeller; Ralf Hildebrandt. IEEE Transactions on Software Engineering, vol. 28, no. 2, pp. 183-200, Feb. 2002. Univeristät des Saarlandes; DeTeLine - Deutsche Telekom Kommunikationsnetze. . doi.org . Cited 1100 times as of Jan 26, 2022. scholar.google.com
- Improving Fault Localization by Integrating Value and Predicate-Based Causal Inference Techniques. Küçük, Yiğit; Tim AD Henderson; Andy Podgurski. International Conference on Software Engineering, 43rd, 2021, pp. 649-660. Case Western Reserve University; Google. Cited 4 times as of Feb 16, 2022.
Terminals have varying levels of color support: plain color; ANSI escape codes (16 color codes with bold/italic and background); 256 color palette; 24-bit truecolor (“888” colors, aka 16 million, e.g.
printf "\x1b[38;2;255;100;0mTRUECOLOR\x1b[0m\n"
).