GCC vs. MSVC: Comparing compile time and RAM usage
We were recently faced with unreproducible crashes of our continuous integration (CI) tests on the Windows build machines (running Win10). Apart from being unreliable, the Windows builds were also much slower compared to our Mac/Linux builds. Subjectively, Windows builds were always slower than Linux/Mac builds which use clang and gcc. I never questioned this and blamed a supposedly slower Microsoft compiler for it. But is that really true?
The Windows-VM had only 4 GB of RAM and was used to compile OpenMS using 2 threads, which sometimes caused "out-of-memory" errors, hence a failed CI.
Throwing more CPU/RAM resources at the problem would certainly solve the problem. On the other hand: finding the underlying cause would reap much broader benefits, such as faster compile time cycles for developers on Windows.
To investigate the underlying cause, we would need to measure the RAM usage of our C++ compile processes, of which there are hundreds for the OpenMS library -- which is 800k lines of code after all.
Time & RAM of GCC on Linux/Unix
On Linux/Unix, measuring peak RAM usage and the execution time of a process is easy: use /usr/bin/time (not to confuse with the bash command "time") that can measure
CPU time (for user and system), but also
maximum resident set size (aka peak RAM) and other very useful data for a specific program.
Here are the flags I was interested in:
%C command line and arguments %E elapsed real time (wall clock) in [hour:]min:sec %e elapsed real time (wall clock) in seconds %M maximum resident set size in KB</p>
Thus, each invocation of
/usr/bin/time -o my.log -a -f "%C\t%E\t%e\t%M" -- gcc -c ...
adds a single line to
my.log which contains the information required to gauge peak RAM and wall time of a compiler invocation for a single translation unit in C++.
Since OpenMS has a build system based on CMake, we wrap the compiler call into a call to
/usr/bin/time using the CMAKE_CXX_COMPILER_LAUNCHER flag:
cmake -DCMAKE_CXX_COMPILER_LAUNCHER="/usr/bin/time;-o;openms_gcc.log;-a;-f;%C\t%E\t%e\t%M;--" -G Ninja -DCMAKE_BUILD_TYPE=Release .
Time & RAM of MSVC on Windows10
There is nothing like
/usr/bin/time on Windows! It is easy to write a small wrapper which measures the execution time of a program (or you take what is already out there, e.g. fpclock).
Obtaining maximum RAM usage is much more tricky: You can stare at the TaskManager or use ProcessExplorer, but they will not give you the maximum RAM usage of a process which executes rather fast (such as a compiler invocation).
Since I could not find anything which was up for the job, I wrote my own clone of
/usr/bin/time for Windows OS.
In honor of its POSIX ancestor, I named it
WinTime. It is freely available on Github: https://github.com/cbielow/wintime .
The invocation is somewhat analog to what we did on Linux
cmake -DCMAKE_CXX_COMPILER_LAUNCHER=WinTime64.exe;-o;c:\openms_msvc.log;-a;-- -G Ninja -DCMAKE_BUILD_TYPE=Release .
Comparing g++ and MSVC
With the data at hand, we can explore and compare:
Looking at wall time for all compilation units of OpenMS in Release mode (with optimizations on) shows partly comparable results between g++ and cl.exe. However, the MSVC compiler exibits excessive RAM and time for a significant proportion (mind the log-scaled axis). This led to frequent fails during build time on our ContinuousIntegration infrastructure. Looking at the most resource-hungry candidates, it turns out they are the CPPs of unit tests (highlighted in blue).
The subfigure for peak RAM usage on the right shows a very similar trend.
Fixing the outliers
The most extreme outlier was
Inspecting the .cpp file reveiled lots of test macros, such as
END_SECTION (64 LOC, 87 occurences), which are part of our testing framework.
The working hypotheses was that MSVC reacts badly (in terms of RAM and time) to big compilations units (it might actually be something else, such as the number of try/catch blocks, but let's ignore that thought for now).
Thus I moved code from the macro (which is essentially inline) to external functions (see #6618).
New compile times
Re-running the compile process on the patched OpenMS sources saw major improvements:
The outliers are mostly gone (ConstRefVector now peaks at 625 MB RAM, a 90% reduction).
Overall, Windows RAM usage and compile time reduced significantly, whereas g++ was mostly unimpressed by the changes (fraction of original RAM and time):
Finally, lets compare the median RAM and compile time across all compilation units after our changes:
|Time [s] (normalized)||1.6||4.1|
The Microsoft compiler (19.33 from VS2022) uses significantly less RAM, but takes longer to compile (compared to g++ 10.2). The operating systems are obviously a factor here, but maybe even more so the hardware this was run on. For Linux, we used an AMD Epyc 7702P (64 Core); for Windows an AMD Ryzen 9 5950X (16 Core). Comparing the single threaded performance of both CPUs, we might even apply a correction factor of 0.6 to the Epyc CPU (2104 points vs. 3469 points for the Ryzen on PassMark single threaded performance). Thus, the compile time difference between compilers when normalizing for different CPUs gets even larger (see last row above).
Take aways for our codebase
Note of caution: OpenMS uses very few templates, which may be a game changer. Also we compile using C++17.
- the MSVC compiler is a lot slower (2.5x on average) than GCC (even more when given large compilation units -- up to 10x)
- MSVC uses less RAM (about 35%) than g++ (unless, again, you give it large compilation units which will make it go trough the roof)
- compile time is a good estimator of RAM usage (Pearson r = 0.9), for both MSVC and GCC
There is certainly more to explore here, but we will leave that for another day.
Check out WinTime on Github to make your own measurements.