Debugging

Debugging 101

Definition? Debugging involves inspecting a program’s internal state.

`printf` Debugging and Logging

In printf debugging, one adds print statements and keeps iterating until enough information has been extracted.

Using logging has several advantages over printf debugging: varying logging destinations (e.g. standard output, files, sockets, remote servers, etc.); severity levels (e.g. INFO, DEBUG, WARN, ERROR, &c) that support filtering of output; color-coding for readability.

Terminals have varying levels of color support: plain color; ANSI escape codes (16 color codes with bold/italic and background); 256 color palette; 24-bit truecolor (“888” colors, aka 16 million, e.g. printf "\x1b[38;2;255;100;0mTRUECOLOR\x1b[0m\n").

Logging can also be disabled in various contexts. For example, Chromium compiles away DLOG statements to no-op in non-debug builds.

In principle though, one can also compile away printf statements into no-ops in various execution contexts. For instance, re-defines assert(x) to DLOG_ASSERT(x). A similar treatment can be applied to printf.

Most programs write logs to some standardized location (sometimes through a shell-provided program like logger). In UNIX systems, logs are usually written in /var/log. There may be specialized terminal utilities for displaying the logs, e.g. journalctl in Linux, and log show on macOS.

Debuggers

Optimized code is frequently tougher to debug because the symbols are often optimized away.

Making one-off adjustments during debugging has varying levels of support, e.g. Haskell’s debugger allows on-the-fly modification; log-points in Microsoft’s C++ Extension for VS Code pauses execution whenever I try it.

Debuggers are programs that let you interact with the execution of a program, allowing actions like:

Halt execution of the program when it reaches a certain line.
Step through the program one instruction at a time.
Inspect values of variables after the program crashed.
Conditionally halt the execution when a given condition is met.

Various debuggers exist, e.g. GDB, LLDB, extensions in VS Code, Haskell’s built-in debugger, and so forth.

Many programming languages come with some form of debugger, e.g. pdb for Python. gdb and lldb are optimized for C-like language debugging, but can probe pretty much any process. There are quality-of-life improvements to the default debuggers, e.g. ipdb , and pwndbg .

Specialized Tools

Whenever programs need to perform actions that only the kernel can, they use system calls. strace (Linux) and dtrace (macOS and BSD) commands let you trace the system calls that a program makes.

System calls include process control (e.g. create process, terminate process, load, execute, get/set process attributes, wait, allocate/free memory), file management (create/delete file, open/close file, read/write/reposition, get/set file attributes), device management (request/release device, read/write/reposition, get/set device attributes, logically attach/detach devices), information maintenance (get/set total system information; get/set process, file, or device metadata), communication (create/delete communication connection, send/receive messages, transfer status information, attach/detach remote devices), protection (get/set file permissions).

See using system calls to identify malicious Python packages .

To look at network packets, tcpdump and Wireshark let you read the contents of network packets and filter them based on different criteria . Fiddler allows you to proxy any HTTP requests.

Static Analysis

Static analysis programs take source code as input and analyze it using coding rules to reason about its correctness. For example, pyflakes can identify bugs like shadowing and use of undefined variables, while bandit finds common security issues in Python code, e.g. parsing untrusted XML. These tools can be integrated into editors and IDEs such that their outputs are displayed inline (code linting), e.g. Vim has ale and syntastic . There are crowd-sourced lists like static analysis tools and awesome linters .

Linters and static analyzers are pretty helpful when picking up a language. It’s like having an ever-present tutor.

Pretty cool that ale can display warnings and errors in files being edited in Vim before files have been saved back to a filesystem. This is possible because Vim has APIs to access the contents of its text buffers.

Code formatters auto-format source code such that it’s consistent with common stylistic patterns. Giving up stylistic control and using standardized code format helps other people read your (stylistically standardized) code.

Code formatters can be pretty opinionated because there can be more than one valid way of achieving something. For example, Python’s Black :

The uncompromising code formatter
“Any color you like.”
By using Black, you agree to cede control over minutiae of hand-formatting. In return, Black gives you speed, determinism, and freedom from pycodestyle nagging about formatting. You will save time and mental energy for more important matters.
Black makes code review faster by producing the smallest diffs possible. Blackened code looks the same regardless of the project you’re reading. Formatting becomes transparent after a while and you can focus on the content instead.

Literature Review

In delta debugging, we want to reduce an object while preserving a certain property. More formally, let \(\mathbb{X}\) be a universe of all objects of interest, \(\phi: \mathbb{X} \to \{ \text{TRUE}, \text{FALSE} \} \) be a test function that determines whether an object \(X \in \mathbb{X}\) exhibits a given property or not, and \(|X|\) be the size of \(X\). The goal is to find an object \(X'\) such that \(|X'|\) is as small as possible, and \(\phi(X') = \text{TRUE}\).

propose a probabilistic delta debugging algorithm (ProbDD) to improve on ’s widely-used ddmin algorithm. The ddmin algorithm follows a predefined sequence of steps to remove items from a sequence, while ProbDD learns a probabilistic model to select which items to remove in the next iteration. The worst-case asymptotic number of tests while using ProDD is \(O(n)\), while that of ddmin is \(O(n^2)\). ProbDD also tends to produce smaller final artifacts than those produced by ddmin.

#probability

In the space of probabilistic algorithms, ProbDD is probabilistic in running time, but not in the answer produced. provide a proof of correctness.

Delta debugging reminds me of git-bisect , which uses binary search to find the commit that introduced a bug. do not mention git-bisect. Assuming that at each each commit, the code is in a state that provides a useful answer¹, git-bisect takes at most \(O(lg N)\) steps. ProbDD looks useful where git history is absent or not useful (e.g. finding the fault within a commit that changed a lot of things).

¹ For example, if we’re looking for a feature bug, then the code must be buildable at each commit. Otherwise, we won’t know whether that commit broke the feature.

Read . There are mentions of binary search, so maybe that’s why did not concern themselves with it.

Statistical Fault Localization (SFL) techniques use execution profiles and success/failure information from execution data to predict which program elements are likely to be faulty.

SFL is one of the techniques for Automated Software Fault Localization (AFL). The goal of AFL is to compute suspiciousness scores for program statements and other elements, and present those to the developers for deeper investigation. SFL is not used widely in industry because most techniques do not consistently locate faults with enough precision.

note that most SFL techniques measure correlation, and are thus susceptible to confounding bias.

Let \(X\) be some independent variable, and \(Y\) some dependent variable. To estimate the effect of \(X\) on \(Y\), the statistician must suppress the effects of extraneous variables that influence both \(X\) and \(Y\). We say that \(X\) and \(Y\) are confounded by some other variable \(Z\) whenever \(Z\) causally influences both \(X\) and \(Y\).

propose UniVal, an SFL technique that uses causal inference techniques and ML to integrate information about both predicate outcomes and variable values. UniVal’s key insight is to transform the program under analysis so that branch and loop predicate outcomes become variable values, so that one causally sound value-based SFL technique can be applied to both variable assignments and predicates.

References

Probabilistic Delta debugging. Guancheng Wang; Ruobing Shen; Junjie Chen; Yingfei Xiong; Lu Zhang. European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Proceedings of the 29th, Aug 2021, pp. 881-892. Peking University; Tianjin University. doi.org . scholar.google.com . github.com . Cited 1 times as of Jan 26, 2022.
Simplifying and Isolating Failure-Inducing Input. Andreas Zeller; Ralf Hildebrandt. IEEE Transactions on Software Engineering, vol. 28, no. 2, pp. 183-200, Feb. 2002. Univeristät des Saarlandes; DeTeLine - Deutsche Telekom Kommunikationsnetze. doi.org . scholar.google.com . Cited 1100 times as of Jan 26, 2022.
Improving Fault Localization by Integrating Value and Predicate-Based Causal Inference Techniques. Küçük, Yiğit; Tim AD Henderson; Andy Podgurski. International Conference on Software Engineering, 43rd, 2021, pp. 649-660. Case Western Reserve University; Google. Cited 4 times as of Feb 16, 2022.
Confounding - Wikipedia. en.wikipedia.org . Accessed Feb 16, 2022.
Debugging and Profiling · the missing semester of your cs education. Anish Athalye; Jon Gjengset; Jose Javier Gonzalez Ortiz. missing.csail.mit.edu . 2020. Accessed Jul 2, 2022.
logging.h - Chromium Code Search. source.chromium.org . Accessed Jul 2, 2022.
GitHub - termstandard/colors: Color standards for terminal emulators. github.com . Accessed Jul 3, 2022.
System call - Wikipedia. en.wikipedia.org . Accessed Jul 9, 2022.