Saturday, November 9, 2019

Clang precompiled headers and improving C++ compile times, take #2

It's been almost half a year since I mentioned how precompiled headers do (not) improve C++ compile times. Quite a long time, filled with doing other things, life, occassionally working on getting my patch production-ready and, last but definitely not least, abandoning that patch and starting from scratch again.
It turns out, the problems I mentioned last time had already been more or less solved in Clang. But only for C++ modules, not for precompiled headers. *sigh* I had really mixed feelings when I finally realized that. First of all, not knowing Clang internals that well, it took me quite a long time to get to this point figuring it all out, probably longer than it could have. Second, I've been using C++ modules when building Clang itself and while it's usable, I don't consider it ready (for example, sometimes it actually makes the build slower), not to mention that it's non-trivial to setup, not standardized yet and other compilers (AFAIK) do not yet support C++ modules. And finally, WTH has nobody else yet noticed and done the work for precompiled headers too? After all the trouble with finding out how the relevant Clang parts work, the necessary patches mostly border on being trivial. Which, on the other hand, is at least the good news.
And so I'm switching for LibreOffice building to my patched build of Clang. For the motivation, maybe let's start with an updated picture from the last time:
This is again column2.cxx, a larger C++ file from Calc. The first row is again compilation without any PCH involved. The second row is unpatched Clang with --enable-pch=full, showing again that way too large PCHs do not really pay off (here it does, because the source file is large, but for small ones such as bcaslots.cxx shown last time it makes things slower). In case you notice the orange 'DebugType' in the third row that looks like it should be in the second row too, it should be there, but that's one of these patches of mine that the openSUSE package does not have.
The third row is with one patch that does the PerformPendingInstantiations phase also already while building the PCH. The patch is pretty much a one-liner when not counting handling fallout from some Clang tests failing because of stuff getting slightly reordered because of this. Even by now I still don't understand why PCH generation had to delay this until every single compilation using the PCH. The commit introducing this had a commit message that didn't make much sense to me, the test it added works perfectly fine now. Presumably it's been fixed by the C++ modules work. Well, who cares, it apparently works.
The last row adds also Clang options -fmodules-codegen -fmodules-debuginfo. They do pretty much what I was trying to achieve with my original patch, they just approach the problem from a different side (and they also do not have the technical problems that made me abandon my approach ...). They normally work only for C++ modules, so that needed another patch, plus a patch fixing some problems. Since this makes Clang emit all kinds of stuff from the PCH into one specific object file in the hopes that all the compilations using the PCH will need that too but will be able to reuse the shared code instead, LibreOffice now also needs to link with --gc-sections, which throws away all the possibly problematic parts where Clang guessed wrong. But hey, it works. Even with ccache and Icecream (if you have the latest Icecream, that is, and don't mind that it "implements" PCHs for remote compilations by simply throwing the PCH away ... it still pays off).
So, that it's for a single compilation. How much does it help with building in practice? Time for more pretty colorful pictures:

This is a debug LO build on my 4-core (8 HT) Ryzen laptop, Library_sm is relatively small (36 source files), Library_scfilt is larger (154 source files). Plain 'Clang' means unpatched Clang(v9), 'Clang+' is with the PerformPendingInstantiations patch (i.e. the third row above), 'Clang++' is both patches (i.e. the fourth row above). The setting is either --enable-pch=base for including only system and base LO headers in the PCH, or --enable-pch=full for including everything that makes sense. It clearly shows that using large PCHs with GCC or unpatched Clang just doesn't make sense.
Note that GCC(v9) and MSVC(v2017) are there more as a reference than a fair comparison. MSVC runs on a different OS and the build may be possibly slightly handicaped by some things taking longer in Cygwin/Windows. GCC comes from its openSUSE package, which AFAICT is built without LTO (unlike the Clang package, where it makes a noticeable difference).
And in case the graphs don't seem impressive enough, here's one for Library_sc, which with its 598 source files is too big for me to bother measuring it in all cases. This is the difference PCHs can make. That's 11:37 to 4:34, almost down to one third:
As for building entire LO from scratch, it can be like in the picture below (or even better). The effect there is smaller, because the build consists of other things than just building libraries, and some of the code built doesn't use PCHs. And it's even smaller than it could be, because I used --enable-pch=base, as that's what I've been using up to now (although now I'll switch to a higher level). That's about 1h42m without PCHs to 1h14m with unpatched Clang (27% percent saved), and 1h06m with patches (and the 8minutes difference is still 11% of the unpatched time). Not bad, given that this is the entire LO build. Those 6 minutes for ccache are there to show the maximum possible improvement (or rather nowhere near possible, since the compiler normally still has to do the work of actually compiling the code somehow).
In case you'd want to use this too, that's not up to me now. The patches are now sitting and waiting in the LLVM Phabricator. Hopefully somebody there still cares about PCHs too.

No comments:

Post a Comment