Software Engineering Journal Reviews

Dated Jan 15, 2022; last modified on Thu, 17 Feb 2022

Formal Software Design

Alloy is an open-source language and analyzer for software modeling. An Alloy model is a collection of constraints that describe a set of structures, e.g. all possible security configurations of a web application. Alloy’s tool, the Alloy Analyzer is a solver that takes the constraints of a model and finds structures that satisfy them.

The Alloy Analyzer leverages a SAT solver, and this precludes Alloy from analyzing optimization problems. propose AlloyMax, an extension of Alloy that can analyze problems with optimal solutions, soft constraints and priorities. AlloyMax adds new language constructs for specifying optimization problems, and uses an analysis engine that leverages a Maximum Satisfiability (MaxSAT) solver. also provide a translation mechanism from a first-order relational logic into a weighted conjunctive normal form (WCNF) that the MaxSAT solver expects.

The Boolean Satisfiability Problem (SAT) asks whether the variables of a given Boolean formula can be satisfied (i.e. consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE). SAT was the first problem that was proven to be NP-complete. Nonetheless, there are heuristic SAT-algorithms that solve problems involving tens of thousands of variables and formulas having millions of symbols. This is typically good enough for practical SAT problems.

MaxSAT is an extension of the SAT problem that asks for the maximum number of clauses which can be satisfied by any assignment. It also has efficient approximation algorithms, but is NP-hard to solve exactly. Worse still, it is APX-complete, i.e. no polynomial-time approximation scheme unless \(P = NP\).

To understand what is first-order relational logic, and how it differs from other kinds of logic, A relational logic primer — Formal Software Design with Alloy 6 seems like a good start.

Should have paid more attention in ELE 206: Introduction to Logic Design .

Dynamic Program Analysis

Dynamic program analysis is a technique for monitoring, understanding, and potentially intervening in program behavior during its execution . Types of dynamic program analysis include code coverage, memory error detection, fault localization, invariant inference, concurrency errors, program slicing, and performance analysis. Most techniques involve code instrumentation, which incurs runtime overhead.

propose module recontextualization, a dynamic program analysis approach for dynamic languages . Dynamic languages feature a module-import mechanism that loads the code at runtime as a string. The approach uses lightweight transformations on the module’s string representation to insert developer-provided, analysis-specific code into the module before it is loaded. This allows the developer to capture all interactions around the module in unmodified production language runtime environments. Analysis via module recontextualization is coarse (module-level vs. instruction-level), but has low runtime overhead (\(\approx 4\%\)). The authors further provide an implementation for JavaScript and Racket ecosystems .

Software Debugging

In delta debugging, we want to reduce an object while preserving a certain property. More formally, let \(\mathbb{X}\) be a universe of all objects of interest, \(\phi: \mathbb{X} \to \{ \text{TRUE}, \text{FALSE} \} \) be a test function that determines whether an object \(X \in \mathbb{X}\) exhibits a given property or not, and \(|X|\) be the size of \(X\). The goal is to find an object \(X'\) such that \(|X'|\) is as small as possible, and \(\phi(X') = \text{TRUE}\).

propose a probabilistic delta debugging algorithm (ProbDD) to improve on ’s widely-used ddmin algorithm. The ddmin algorithm follows a predefined sequence of steps to remove items from a sequence, while ProbDD learns a probabilistic model to select which items to remove in the next iteration. The worst-case asymptotic number of tests while using ProDD is \(O(n)\), while that of ddmin is \(O(n^2)\). ProbDD also tends to produce smaller final artifacts than those produced by ddmin.

In the space of probabilistic algorithms, ProbDD is probabilistic in running time, but not in the answer produced. provide a proof of correctness.

Delta debugging reminds me of git-bisect , which uses binary search to find the commit that introduced a bug. do not mention git-bisect. Assuming that at each each commit, the code is in a state that provides a useful answer1, git-bisect takes at most \(O(lg N)\) steps. ProbDD looks useful where git history is absent or not useful (e.g. finding the fault within a commit that changed a lot of things).

1 For example, if we’re looking for a feature bug, then the code must be buildable at each commit. Otherwise, we won’t know whether that commit broke the feature.

Read . There are mentions of binary search, so maybe that’s why did not concern themselves with it.

Statistical Fault Localization (SFL) techniques use execution profiles and success/failure information from execution data to predict which program elements are likely to be faulty.

SFL is one of the techniques for Automated Software Fault Localization (AFL). The goal of AFL is to compute suspiciousness scores for program statements and other elements, and present those to the developers for deeper investigation. SFL is not used widely in industry because most techniques do not consistently locate faults with enough precision.

note that most SFL techniques measure correlation, and are thus susceptible to confounding bias.

Let \(X\) be some independent variable, and \(Y\) some dependent variable. To estimate the effect of \(X\) on \(Y\), the statistician must suppress the effects of extraneous variables that influence both \(X\) and \(Y\). We say that \(X\) and \(Y\) are confounded by some other variable \(Z\) whenever \(Z\) causally influences both \(X\) and \(Y\).

propose UniVal, an SFL technique that uses causal inference techniques and ML to integrate information about both predicate outcomes and variable values. UniVal’s key insight is to transform the program under analysis so that branch and loop predicate outcomes become variable values, so that one causally sound value-based SFL technique can be applied to both variable assignments and predicates.

Probabilistic Programming

propose loop perforation, a general technique to trade accuracy for performance by transforming loops to execute a subset of their iterations. They were able to make programs twice as fast, while changing the result by less than \(10\%\).

for (i = 0; i < N; i++) { ... } // Original loop

d = 1 / ( 1 - r); // r = perforation rate, in the range [0, 1)
for (i = 0; i < N; i += d) { ... } // Loop after an interleaving perforation

proceed in two phases. The criticality testing phase finds the set of tunable loops \(P\) in an application \(A\), given a set of representative inputs \(T\), a set of perforation rates \(R\), and an accuracy bound \(b\). The second phase finds a subset \(S \subseteq P\) in \(A\), given \(T\) and \(b\).

For a loop \(l_i\) to be in \(P\), it must speed up the program, incur an accuracy loss that is less than \(b\), and not cause memory errors in \(A\). The memory errors are found running \(A\) under Valgrind.

Applications that are amenable to loop perforation tend to have these computation patterns:

  • Iterating over a search space of items. Perforation skips some of the items.
  • Using a search metric to drive a search for the most desirable item. Perforation produces a less accurate but efficient metric.
  • Performing Monte-Carlo simulation. Perforation evaluates fewer samples.
  • Repeatedly improving on an approximate result to obtain a more accurate result. Perforation uses fewer improvement steps.
  • Traversing a data structure and updating elements with computed values. Perforation skips some of the elements.

Version Control Systems (VCS)

VCS track line-level changes, which lose historical context after transformations like moves/renames across the file system, and groups of lines being moved within files. Furthermore, developers are usually interested in a slice of the history, and not the history for the whole file. propose CodeShovel, a tool for building method histories on demand that outperforms tools from research (e.g. FinerGit which needs pre-processing), and practice (e.g. IntelliJ’s git history, git log -L). Field study with 16 industrial devs confirmed CodeShovel’s correctness, low runtime overhead, and versatility.

have a public website that demonstrates CodeShovel, given the URL of a git repo. Tried it out to see if it would catch this commit from a past project of mine: Refactor: Move non-flask related actions from to · dchege711/cos333_tiger_leagues@0410b56 . I’m impressed!

CodeShovel&rsquo;s analysis of the get_user function currently in JSON version. Credits:

CodeShovel’s analysis of the get_user function currently in JSON version. Credits:

I didn’t know about , which has two forms: -L<start>,<end>:<file> and -L:<funcname>:<file>, where start, end and funcname can be regex. If end is a regex, then the search starts at the line given by start.

For example, git log -L '/int main/',/^}/:main.c shows how the function main() in the file main.c evolved over time. The challenge for the dev is constructing appropriate markers for start and end; it helps if the file is well-formatted (e.g. indentation).

’s CodeShovel starts out with a repo, starting SHA, and current location of the method. It uses a language-specific parser to generate an Abstract Syntax Tree (AST) of all methods in the files analyzed. For a given commit, CodeShovel stores the current file path (\(\textit{path}\)), line number (\(\textit{num}\)), method signature (\(\textit{sig}\)), and method body (\(\textit{body}\)) for the specified method. It then works backwards to the preceding commit that modified the \(\textit{path}\) containing the method. The method matching algorithm is as follows:

// Inputs
// sig: method signature
// body: method body text
// path: path to file containing method
// files: list of all files changed in the commit

// Phase 1: Find unchanged method within the same file
FOREACH meth in files[path]
  IF sim(meth["sig"], sig) == 1.0 && sim(meth["body"], body) == 1.0
    return NO_CHANGE

// Phase 2: Find modified method within the same file
FOREACH meth in files[path]
  IF sim(meth["body"], body) >= 0.75
    return meth // Method found in file

// Phase 3: Find method within renamed or moved file
FOREACH file in files
  FOREACH meth in file
    IF sim(meth["sig"], sig) == 1.0 && sim(meth["body"], body) >= 0.5
      return meth // Method found in moved file

// Phase 4: Find method modified from different file
methods = all methods in all files
// Sort methods by decreasing body similarity
methods = sort(methods, sim(entry["body"], body))
// Find highest matching method
FOREACH meth in methods
  IF isShort(meth) && sim(meth["body"], body) >= 0.95 // < 20 chars = short
    return meth
  ELSE IF sim(meth["body"], body) >= 0.82
    return meth

// No match, last commit introduced the method
return null

… where the thresholds were data-informed.

The hierarchy of change kinds in CodeShovel. Credits: Grund2021.

The hierarchy of change kinds in CodeShovel. Credits: Grund2021.

To add support for a new language, CodeShovel defines two interfaces, Parser and Method. The rest of the code is language agnostic. codeshovel/src/main/java/com/felixgrund/codeshovel/parser/impl at master · ataraxie/codeshovel currently has implementations for Java, Python and TypeScript.

Workplace Dynamics

investigate how the mass shift to working from home (WFH) during the COVID-19 pandemic affected dev teams (n = 608). 30% thought the ability of the team to reach milestones changed: reduced productivity (general, child care), communication & collaboration challenges. 499 said their team culture changed: emphasized social interaction, more meetings, missing interactions, and increased empathy. On how teams have supported members during WFH: peer support, virtual social meetings, instant message apps, understanding personal WFH situations, and personal check-ins. On team communication and collaboration, there was a marked decrease in feeling socially connected (65%!), ability to brainstorm, communication ease, and knowledge flow. On the other hand, there was a marked increase in frequency of meetings, and team member notifications.

surveyed only Microsoft devs. While they did survey devs from different orgs, they’re all from Microsoft. It’d be interesting to see how the survey responses change (or not) in other companies, especially smaller ones.


Dynamic Programming Language
A high-level programming language, which at runtime executes many common programming behaviors that static programming languages perform during compilation, e.g. adding new code, extending objects and definitions, modifying the type system, etc. Examples: JavaScript, Python, Ruby, PHP, Perl, Common Lisp, R, Objective-C, etc..


  1. AlloyMax: bringing maximum satisfaction to relational specifications. Zhang, Changjian; Ryan Wagner; Pedro Orvalho; David Garlan; Vasco Manquinho; Ruben Martins; Eunsuk Kang. European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Proceedings of the 29th ACM Joint Meeting, Aug 2021, pp. 155-167. Carnegie Mellon University; Universidade de Lisboa. . . Cited 0 times as of Jan 24, 2022.
  2. Boolean satisfiability problem - Wikipedia. . Accessed Jan 22, 2022.
  3. About Alloy. . Accessed Jan 22, 2022.
  4. Efficient Module-level Dynamic Analysis for Dynamic Languages with Module Recontextualization. Nikos Vasilakis; Grigoris Ntousakis; Veit Heller; Martin C. Rinard. European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Proceedings of the 29th ACM Joint Meeting, Aug 2021, pp. 1202–1213. Massachusetts Institute of Technology; Technical University of Crete; Unaffiliated. . . Cited 2 times as of Jan 24, 2022.
  5. Dynamic program analysis - Wikipedia. . Accessed Jan 24, 2022.
  6. Dynamic programming language - Wikipedia. . Accessed Jan 24, 2022.
  7. Probabilistic Delta debugging. Guancheng Wang; Ruobing Shen; Junjie Chen; Yingfei Xiong; Lu Zhang. European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Proceedings of the 29th, Aug 2021, pp. 881-892. Peking University; Tianjin University. . . . Cited 1 times as of Jan 26, 2022.
  8. Simplifying and Isolating Failure-Inducing Input. Andreas Zeller; Ralf Hildebrandt. IEEE Transactions on Software Engineering, vol. 28, no. 2, pp. 183-200, Feb. 2002. Univeristät des Saarlandes; DeTeLine - Deutsche Telekom Kommunikationsnetze. . . Cited 1100 times as of Jan 26, 2022.
  9. Managing Performance vs. Accuracy Trade-offs with Loop Perforation. Stelios Sidiroglou-Douskos; Sasa Misailovic; Henry Hoffmann; Martin Rinard. ACM SIGSOFT Symposium, 19th; European Conference on Foundations of Software Engineering, 13th, September 2011, Pages 124–134. Massachusetts Institute of Technology. . . Cited 585 times as of Jan 30, 2022.
  10. CodeShovel: Constructing Method-Level Source Code Histories. Grund, Felix; Shaiful Chowdhury; Nick C. Bradley; Braxton Hall; Reid Holmes. International Conference on Software Engineering, 43rd, 2021, pp. 1510-1522. University of British Columbia. . . . Cited 4 times as of Feb 6, 2022.
  11. Git - git-log Documentation. . Accessed Feb 6, 2022.
  12. 'How Was Your Weekend?' Software Development Teams Working From Home During COVID-19. Miller, Courtney; Paige Rodeghero; Margaret-Anne Storey; Denae Ford; Thomas Zimmermann. International Conference on Software Engineering, 43rd, 2021, pp. 624-636. New College of Florida; Clemson University; University of Victoria; Microsoft Research. Cited 23 times as of Feb 16, 2022.
  13. Improving Fault Localization by Integrating Value and Predicate-Based Causal Inference Techniques. Küçük, Yiğit; Tim AD Henderson; Andy Podgurski. International Conference on Software Engineering, 43rd, 2021, pp. 649-660. Case Western Reserve University; Google. Cited 4 times as of Feb 16, 2022.
  14. Confounding - Wikipedia. . Accessed Feb 16, 2022.