Of Code Smells and Hygiene

Pick up from:

IEEE’s International Conference on Software Maintenance
- Improving Code: The (Mis)perception of Quality Metrics
Springer’s Software Quality Journal

On Abstractions

If you find yourself adding more parameters and if-statements to an existing abstraction, is the abstraction still apt? Why not remove the old abstraction and re-extract a new [more apt] abstraction? Devs frequently succumb to the sunken cost fallacy thinking that there must have been a reason that the code was written in a certain way.

is straw-manning though. It’s less of a sunk cost fallacy, and more of a Chesterton’s Fence situation: Do not remove a fence until you know why it was put up in the first place. Unless we know why someone made a decision, we can’t safely change it or conclude that they were wrong.

Software tends to have an advantage in figuring out why the fence existed in the first place: version control. With VC, one can track when the change was made, and who made the change, and why – assuming meaningful commit history. Also, if the code base is strongly typed and has tests, evaluating the effect of the change is more tractable. That said, evaluating a change is hard, and hence the need for progressive rollout of software, instead of an all-out deployment.

Quantitative Metrics

Qualitative code quality metrics are good guides in code review. For example:

How easy is it to achieve a task, e.g., disable feature X.
- Shouldn’t need to pull in a bunch of components at high levels of abstraction.
How tightly are the components coupled?
- Do the tests need workarounds like mocks and access to private members?
How readable is the code?
- For a given function, how much branching and state mutation occurs?
Files in which other people are not confident to make changes?'

However, qualitative metrics don’t scale well because they need active human input.

Quantitative metrics can be collected passively, and ring alarm bells if necessary. However, we lose accuracy because the quantitative models simplify the more complex/rich reality.

To avoid the metrics being goals in and of themselves, paying attention to warnings rather than “good” values seems prudent. For example, a low readability score is more important/informative than a high one, as the low score provides a starting point for focusing improvement efforts.

Wikipedia’s Software metric page links to a couple of metrics, and is a launchpad for more targeted investigations.

Control-flow graphs come up in metrics discussions. A control-flow graph (CFG) is a graphical representation of all paths that might be traversed through a program during its execution. CFGs are essential to many compiler optimizations and static analysis tools.

Source Lines of Code

Source Lines of Code is used to measure the size of a program by counting the number of lines. Useful comparisons tend to involve the order of magnitude of LOC, e.g., comparing 10K LOC project to a 100K LOC project is far more useful than comparing a 20K LOC project to a 21K LOC project.

The most common definition of physical SLOC (LOC) is a count of lines in the text of the program’s source code, excluding comment lines. Logical SLOC (LLOC) tries to measure the number of executable statement and can vary by language. For example,

for (i = 0; i < 100; i++) printf("Hello"); /* How many LOC is this? */

… has 1 physical LOC, 2 LLOC (for statement and printf statement), and 1 comment line.

LOC, especially physical LOC, can be automatically determined, is intuitive, and is ubiquitous given its early debut. However:

There is no standard definition of what a line of code is, e.g., are data declarations included?
As a measure of productivity, it’s limited, e.g., the coding phase accounts for ~30% of the overall effort; skilled devs may achieve high functionality with lower LOC.
Different languages (or even GUI tools) have different verbosities. Furthermore, some projects use multiple languages, and this nuance is lost when LOC are combined.

ABC Score

The ABC score is represented by a 3-D vector <Assignments (A), Branches (B), Conditionals (C)>. Unlike the Source Lines of Code (SLOC) measure, the ABC score is independent of the coding style, and therefore a more robust measure of code size.

LoC is quite a popular measure. Part of it is the ease in counting. For instance, I can count the LoC that I’ve introduced to a repository with some git-fu and stream processing. Doing the same for the ABC score is not as straightforward.

Function Point

The function point expresses the amount of business functionality an information system (as a product) provides to a user. Compared to using LOC, using function points avoids inflated LOC if used as a productivity metric, is agnostic to the programming language, and is more predictable early on as it can be derived from requirements.

Counting function points seems tedious. I’ve read The Simple Function Point method , which was meant to be more palatable, and I still don’t have a good idea of how to proceed.

Coverage

Coverage measures what percentage of code has been executed by a test suite. Function coverage checks if each function in the program has been called. Statement coverage checks if each statement in the program has been executed. Edge coverage checks if each edge in the control-flow graph has been executed. Condition coverage checks if each Boolean sub-expression evaluated to both true and false.

More discussion about test coverage in Testing Your Code .

Cyclomatic Complexity (McCabe’s Complexity)

Cyclomatic complexity is a measure of the number of linearly independent paths through a program’s source code. Two paths in the CFG are linearly independent if the symmetric difference (union minus intersection) of their edge sets is non-empty. Applications include:

Limiting the cyclomatic complexity of a module to say 10, or provide a written explanation of why the limit was exceeded.
Measuring how well a program is structured by iteratively condensing all subgraphs that have a single-entry and a single-exit point. Once the graph is irreducible, the cyclomatic complexity will be precisely 1 for structured programs will be 1, and greater than 1 for non-structured programs.
Given that branch coverage \(\le\) cyclomatic complexity \(\le\) number of paths, generating test cases for the linearly independent paths (basis path testing) guarantees complete branch coverage, without covering all possible paths in the CFG. Covering all possible paths is usually too costly.
Controlled for program size (typically measured in LoC), the jury is still out on whether lower cyclomatic complexity reduces the number of defects in code.

Connascence

Two components are connascent if a change in one requires the other to be modified in order to maintain the overall correctness of the system. Static connascences include: name of an entity, type of an entity, conventions (e.g. -1 to mean “invalid”), order of values (e.g. positional parameters), choice of algorithm (e.g. message authentication codes). Dynamic connascences include: when order of execution of multiple components is important; when timing of the execution of multiple components is important; when several values much change together; when multiple components must reference the same entity.

Cohesion

Cohesion refers to the degree to which elements inside a module belong together. How much in common do the methods of a class have? Do methods carry out a small number of related activities. Are related methods in the same source file/folder?

ranks coincidental cohesion as the worst: Coincidental cohesion is when parts of a module are grouped arbitrarily; the only relationship between the parts is that they have been grouped together (e.g., a Utilities class).

Granted, I’ve not written tons of utility files, but it’s easy to stash a function in a utilities file because there is no better place to put it. Maybe a util file is the dev equivalent of “every toggle/setting in a program is a sign of a product manager that failed to make a decision.”

Coupling

Coupling is a measure of how closely connected two routines or modules are. Low coupling often correlates with high cohesion and vice-versa. Tightly coupled systems exhibit: connascence , higher cost of assembling modules, low reusability and testability as dependent modules must also be included.

Comment Density

Comment density, the percentage of comment lines within the code, is a measure of the code readability, self-descriptiveness, and understandability.

Was introduced to this metric in a code review at my first industry job. At the time, I didn’t realize that too many comments was a symptom of unnecessary complexity. That incident has stayed with me since.

Halstead Complexity Measures

describes Halstead’s goal as more than just complexity metrics: to identify measurable properties of software, and the relations between them.

Given a C program like:

main() {
    int a, b, c, avg;
    scanf("%d %d %d", &a, &b, &c);
    avg = (a + b + c) / 3;
    printf("avg = %d", avg);
}

The distinct operators are main, (), {}, int, ,, ;, scanf, &, =, +, and /; \(\eta_1 = 12\).
The distinct operands are a, b, c, avg, "%d %d %d", 3, and "avg = %d"; \(\eta_2 = 7\).
The total number of operators, \(N_1\), is \(27\).
The total number of operands, \(N_2\), is \(15\).

With \(\eta_1, \eta_2, N_1, N_2\), several measures can be computed, e.g.:

Estimated program length: \(\hat{N} = \eta_1 \cdot lg\ \eta_1 + \eta_2 \cdot lg\ \eta_2 = 12\ lg\ 12 + 7\ lg\ 7 = 62.67\)
Difficulty to write/understand: \(D = \frac{\eta_1}{2} \times \frac{N_2}{\eta_2} = \frac{12}{2} \times \frac{15}{7} = 12.85\).

Given a project, comparing \(D\) scores while controlling for \(\hat{N}\) seems like a good way of knowing where to focus improvement efforts. \(D = 12.85\) doesn’t mean much by itself.

Weighted Micro Function Points

Weighted Micro Function Points (WMFP) is an algorithm that parses source code, breaks it down into micro functions, and derives complexity & volume metrics which are interpolated into a final effort score.

Code Quality as an Industry

Promises include:

Providing standard, objective, reproducible, and comparable time and cost estimates.
Differential comparisons between different code versions, e.g. linear WMFP algorithm.
Supporting multiple languages, or diving deep into a specific one.
Implementations of various effort and cost estimates, e.g. WMFP.
Computation of code metrics, e.g. LLOC.
Early warning signs, e.g. insufficient comments, complexity, etc.
On-device computation without sending code to a server.
Speed of analyzing the code base, e.g. integration into IDE.
Setting a baseline upon which to propose improvements, instead of proposing thousands of fixes.
Integration with CI/CD to gate incoming code per a quality criterion.
Code visualization, e.g., dependency graphs, trend charts, dependency matrices, etc.

Some IDEs have more bells and whistles. For example, Visual Studio has C++ code analysis support, and for .NET code, computes the metrics such as maintainability index, cyclomatic complexity, depth of inheritance, class coupling, SLOC, etc.

The C++ Code analysis support seems enticing to prompt the jump from VS Code to Visual Studio. In VS Code, I have Clang-Tidy diagnostics, but Visual Studio promises Clang-Tidy, C++ Core Guidelines, SAL annotations, and more.

References

The Wrong Abstraction. Sandi Metz. www.sandimetz.com . news.ycombinator.com . Jan 20, 2016.
Chesterton’s Fence: A Lesson in Second Order Thinking. fs.blog . Accessed Nov 19, 2022.
Control-flow graph. en.wikipedia.org . Accessed Nov 19, 2022.
ABC Software Metric. en.wikipedia.org . Accessed Nov 19, 2022.
Code coverage. en.wikipedia.org . Accessed Nov 19, 2022.
Cohesion (computer science). en.wikipedia.org . Accessed Nov 19, 2022.
DI - Metric Thresholds. archive.ph . www.lsec.dnd.ca . Accessed Nov 19, 2022.
Connascence. en.wikipedia.org . Accessed Nov 19, 2022.
Coupling (computer programming). en.wikipedia.org . Accessed Nov 20, 2022.
Cyclomatic complexity. en.wikipedia.org . Accessed Nov 20, 2022.
Source lines of code. en.wikipedia.org . Accessed Nov 20, 2022.
Function point. en.wikipedia.org . Accessed Nov 20, 2022.
Halstead complexity measures. en.wikipedia.org . Accessed Nov 20, 2022.
Weighted Micro Function Points - Wikipedia. en.wikipedia.org . Accessed Nov 20, 2022.
Documentation - ProjectCodeMeter Software Sizing for Outsourcing Work Hours Assessment and Development Cost Estimation. www.projectcodemeter.com . Accessed Nov 20, 2022.
Improve your .NET code quality with NDepend. www.ndepend.com . Accessed Nov 20, 2022.
Code quality metrics: How to evaluate and improve your code. Jacob Schmitt. circleci.com . Accessed Nov 20, 2022.
SonarQube Documentation | SonarQube Docs. docs.sonarqube.org . Accessed Nov 20, 2022.
C/C++ code analyzers | Microsoft Learn. learn.microsoft.com . learn.microsoft.com . Accessed Nov 20, 2022.