Clang, LLVM, GCC, and MSVC
LLVM is an umbrella project, with several sub-projects, e.g. LLVM Core and Clang. LLVM Core libraries provide an optimizer and code generator for different CPUs. Clang is an “LLVM native” C/C++/Objective-C compiler which aims for fast compilation, useful error and warning messages, and a platform for building source-level tools. The Clang Static Analyzer and clang-tidy are examples of such tools.
Microsoft Visual C++ (MSVC) is Microsoft’s proprietary compiler for C, C++ and C++/CX. It is bundled with Visual Studio.
GNU’s Compiler Collection (GCC) includes front ends for C, C++, Objective-C, Fortran, Ada, Go, and D.
LLVM, MSVC and GCC also have implementations (libc++
, MSVC STL
, and
libstdc++
, respectively) of the C++ standard library.
Found myself with Clang, probably by the fact that I work on a Chromium-based browser, which already uses Clang. I expect that a lot of research will be on the open-source Clang and GCC compilers, as opposed to proprietary ones such as MSVC.
When Chrome/Chromium moved to Clang, bore sentiment of Google being more invested in LLVM/Clang than in GNU/GCC. There are politics when it comes to C++ toolchains.
Improving Code Using Clang Tools
Anticipated capabilities from clang tools:
- Remove branches that are never executed in practice (reduces complexity).
- Increase
const
correctness to allow clients to pass aroundconst
references/pointers. - Increase cohesiveness within a module, and reduce coupling with other modules.
- Flag/Fix violations of rules of thumb from static analyzers.
- Remove unused includes from source files.
The Chromium project has examples of “real-world” improvements via Clang tools, e.g.:
- Adding
std::move
after running some heuristics, e.g., local variable or param, no qualifiers, not a reference nor pointer, not a constructor, is not captured by a lambda, etc. - Updating conventions, e.g.,
int mySuperVariable
toint my_super_variable
andconst int maxThings
toconst int kMaxThings
. - Updating API usage, e.g.,
::base::ListValue::GetSize
toGetList().size
,std::string("")
tostd::string()
.
The vibe that I’m getting is that one can only go so far with find + replace. Some changes require treating the source files as C++ source code instead of simply text. For such changes, trying to craft a regex (or multiple passes) will become too tedious, buggy, or even outright infeasible.
Clang Static Analyzer
Uses a collection of algorithms and techniques to analyze source code in order to find bugs that are traditionally found using run-time debugging techniques such as testing. Slower than compilation. May have false positives.
False Positives
False positives may occur due to analysis imprecision, e.g. false paths, insufficient knowledge about the program. A sample false paths analysis:
int f(int y) {
int x;
if (y) x = 1;
printf("%d\n", y);
if (y) return x;
return y;
}
$ clang -warn-uninit-values /tmp/test.c
t.c:13:12: warning: use of uninitialized variable
return x;
^
There are two feasible paths: neither branch taken (y == 0)
, and both branches
taken (y != 0)
, but the analyzer issues a bogus warning on an infeasible path
(not taking the first branch, but taking the second).
Static Analyzer Algorithms
More precise analysis can reduce false positives.
Flow-Sensitive Analyses reason about the flow of values without considering path-specific information:
if (x == 0) ++x; // x == ?
else x = 2; // x == 2
y = x; // x == ?, y == ?
… but they are linear-time algorithms.
Path-Sensitive Analyses reason about individual paths and guards on branches:
if (x == 0) ++x; // x == 1
else x = 2; // x == 2
y == x; // (x == 1, y == 1) or (x == 2, y == 2)
… and can therefore avoid false positives based on infeasible paths. However, they have a worst-case exponential-time, but there are tricks to reduce complexity in practice.
At this point, the takeaway can be, “Figure out how to run Clang’s static analyzer on your codebase, read the report, and then fix the legitimate issues.” Further reading might help illuminate the root cause of a false positive, but that can be deferred until you encounter the false positive.
So if I were to create a programming language, I can define a transformation into LLVM intermediate representation (LLVM IR), and that will make use of LLVM core to optimize it? Sweet!