C++ Meta-Programming

Dated Nov 28, 2022; last modified on Mon, 28 Nov 2022

Clang, LLVM, GCC, and MSVC

LLVM is an umbrella project, with several sub-projects, e.g. LLVM Core and Clang. LLVM Core libraries provide an optimizer and code generator for different CPUs. Clang is an “LLVM native” C/C++/Objective-C compiler which aims for fast compilation, useful error and warning messages, and a platform for building source-level tools. The Clang Static Analyzer and clang-tidy are examples of such tools.

So if I were to create a programming language, I can define a transformation into LLVM intermediate representation (LLVM IR), and that will make use of LLVM core to optimize it? Sweet!

Microsoft Visual C++ (MSVC) is Microsoft’s proprietary compiler for C, C++ and C++/CX. It is bundled with Visual Studio.

GNU’s Compiler Collection (GCC) includes front ends for C, C++, Objective-C, Fortran, Ada, Go, and D.

LLVM, MSVC and GCC also have implementations (libc++, MSVC STL, and libstdc++, respectively) of the C++ standard library.

Found myself with Clang, probably by the fact that I work on a Chromium-based browser, which already uses Clang. I expect that a lot of research will be on the open-source Clang and GCC compilers, as opposed to proprietary ones such as MSVC.

When Chrome/Chromium moved to Clang, bore sentiment of Google being more invested in LLVM/Clang than in GNU/GCC. There are politics when it comes to C++ toolchains.

Improving Code Using Clang Tools

#code-hygiene

Anticipated capabilities from clang tools:

  • Remove branches that are never executed in practice (reduces complexity).
  • Increase const correctness to allow clients to pass around const references/pointers.
  • Increase cohesiveness within a module, and reduce coupling with other modules.
  • Flag/Fix violations of rules of thumb from static analyzers.
  • Remove unused includes from source files.

The Chromium project has examples of “real-world” improvements via Clang tools, e.g.:

  • Adding std::move after running some heuristics, e.g., local variable or param, no qualifiers, not a reference nor pointer, not a constructor, is not captured by a lambda, etc.
  • Updating conventions, e.g., int mySuperVariable to int my_super_variable and const int maxThings to const int kMaxThings.
  • Updating API usage, e.g., ::base::ListValue::GetSize to GetList().size, std::string("") to std::string().

The vibe that I’m getting is that one can only go so far with find + replace. Some changes require treating the source files as C++ source code instead of simply text. For such changes, trying to craft a regex (or multiple passes) will become too tedious, buggy, or even outright infeasible.

Clang Static Analyzer

Uses a collection of algorithms and techniques to analyze source code in order to find bugs that are traditionally found using run-time debugging techniques such as testing. Slower than compilation. May have false positives.

False Positives

False positives may occur due to analysis imprecision, e.g. false paths, insufficient knowledge about the program. A sample false paths analysis:

int f(int y) {
  int x;

  if (y) x = 1;

  printf("%d\n", y);

  if (y) return x;

  return y;
}
$ clang -warn-uninit-values /tmp/test.c
t.c:13:12: warning: use of uninitialized variable
  return x;
         ^

There are two feasible paths: neither branch taken (y == 0), and both branches taken (y != 0), but the analyzer issues a bogus warning on an infeasible path (not taking the first branch, but taking the second).

The analyzer has gotten smarter since . clang -Wuninitialized /tmp/test.c no longer issues that bogus warning.

Static Analyzer Algorithms

More precise analysis can reduce false positives.

Flow-Sensitive Analyses reason about the flow of values without considering path-specific information:

if (x == 0) ++x;  // x == ?
else x = 2;       // x == 2
y = x;            // x == ?, y == ?

… but they are linear-time algorithms.

Path-Sensitive Analyses reason about individual paths and guards on branches:

if (x == 0) ++x;  // x == 1
else x = 2;       // x == 2
y == x;           // (x == 1, y == 1) or (x == 2, y == 2)

… and can therefore avoid false positives based on infeasible paths. However, they have a worst-case exponential-time, but there are tricks to reduce complexity in practice.

At this point, the takeaway can be, “Figure out how to run Clang’s static analyzer on your codebase, read the report, and then fix the legitimate issues.” Further reading might help illuminate the root cause of a false positive, but that can be deferred until you encounter the false positive.

References

  1. The LLVM Compiler Infrastructure Project. llvm.org . Accessed Nov 28, 2022.
  2. Microsoft Visual C++. en.wikipedia.org . Accessed Nov 29, 2022.
  3. GCC, the GNU Compiler Collection - GNU Project. gcc.gnu.org . Accessed Nov 29, 2022.
  4. tools/clang/ - Chromium Code Search. source.chromium.org . Accessed Nov 29, 2022.
  5. C++ Standard Library. en.wikipedia.org . Accessed Nov 29, 2022.
  6. Chrome now uses clang for production builds on Linux | Hacker News. news.ycombinator.com . Nov 17, 2014. Accessed Nov 29, 2022.
  7. Clang Static Analyzer. clang-analyzer.llvm.org . Accessed Nov 29, 2022.
  8. Clang Static Analyzer — Clang 14.0.0 documentation. releases.llvm.org . Accessed Nov 29, 2022.
  9. Finding Software Bugs with the Clang Static Analyzer. Ted Kremenek. Apple. llvm.org . 2008. Accessed Nov 30, 2022.