So, current LibreOffice git version now has support for drawing based on Skia also for the Mac. Both Raster and Metal (the Mac GPU framework). Below is the obligatory screenshot, and here is the hey-it-can-be-faster video.
Sunday, April 18, 2021
In other words, how much faster will a compiler be after it's been built with various optimizations?
Given the recent Clang12 release, I've decided to update my local build of Clang11 that I've been using for building LibreOffice. I switched to using my own Clang build instead of openSUSE packages somewhen in the past because it was faster. I've meanwhile forgot how much faster :), and openSUSE packages now build with LTO, so I've built Clang12 in several different ways to test the effect and this is it:
The file compiled is LO Calc's document.cxx, a fairly large source file, in a debug LO build. The compilation of the file is always the same, the only thing that differs is the compiler used and whether LO's PCH support is enabled. And the items are:
- Base - A release build of Clang12, with (more or less) the default options.
- CPU - As above, with -march=native -mtune=native added.
- LTO - As above, with link-time optimization used. Building Clang this way takes longer.
- LTO+PGO - As above, also with profile-guided optimization used. Building Clang this way takes even longer, as it needs two extra Clang builds to collect the PGO data.
- Base PCH - As Base, and the file is built with PCH used.
- LTO+PGO PCH - As LTO+PGO, again with PCH used.
Or, if you want this as numbers, then with Base being 100%, CPU is 85%, LTO is 78%, LTO+PGO is 59%, Base PCH is 37% and LTO+PGO PCH is 25%. Not bad.
Mind you, this is just for one randomly selected file. YMMV. For the build from the video from the last time, the original time of 4m39s with Clang11 LTO PCH goes down to 3m31s for Clang12 LTO+PGO PCH, which is 76%, which is consistent with the LTO->LTO+PGO change above.
Tuesday, April 6, 2021
With Clang12 almost released, I guess it's high time to write a conclusion to the Clang11 changes that improve compilation times with PCHs. I originally planned to do this after the Clang11 release, but with the process to get the changes reviewed and merged having been so tedious I was glad it was finally over and I couldn't at the time muster the little extra effort to also write this down (I spent way more time repeatedly writing 'ping' and waiting for a possible reaction than writing the code, which was really demotivating). But although the new options are described in the Clang11 release notes, I think it'd be useful to write it down in more detail.
First of all, I've already written why C++ developers might care, but a thousand pictures can be worth more than a thousand words saying how this can save you even 60% of the build time:
In case you'd like to see a similar change in your LibreOffice compilation times, it should be as simple as this:
- Get Clang11 or newer and use it to build LibreOffice.
- Use --enable-pch (preferably --enable-pch=full) to enable PCH use.
You may want to check that your ccache and icecream are not too ancient if you use them, but by now any reasonably recent version should do. And if you need to do changes that would repeatedly trigger larger rebuilds (such as changing a header), the trick to temporarily disable PCH builds is to do 'make BLOCK_PCH=1'. And since PCH builds sometimes cause build errors in non-PCH builds because of missing #include's of headers, it's a good idea to touch all your changed files and do 'make BLOCK_PCH=1' before pushing your change to Gerrit. This is all you should need to know, the LO build system will take care of everything else.
As for the rest of the world, this boils down to the two PCH-related sections in the Clang11 release notes.
The first one is using -fpch-instantiate-templates. It needs to be set when building the PCH, but it will work even if you just add it to the CXXFLAGS used for building everything. Recent enough CMake version should handle this option automatically, I have no idea about other build systems. It should be safe to enable the option for any building with PCH. It's not enabled by default in Clang only because of really corner cases with PCHs that are not self-contained. In other words, as long as your PCH works with an empty .cpp file, it's safe, and if your PCH is not self-contained, you'd be better off fixing that anyway.
The second part using -fpch-codegen -fpch-debuginfo is more complicated, as it requires build system support, and I'm not aware of any build system besides LibreOffice's providing it. This discussion in a CMake ticket provides an example of how to use the option that seems rather simple. For other build systems have a look at the description in the Clang11 release notes for all the details and possible complications, and consider whether it'd be worth it. Which it sometimes may not be, as this graph from my previous post shows (Clang means normal PCH build, Clang+ means only -fpch-instantiate-templates, Clang++ means all 3 options).
Note that you may get rather different results depending on how much you put in your PCHs. Unlike before, now the general rule should be that the more you add to your PCHs, even if shared only by several source files, the faster the builds usually will be. And since these options move building some code to the PCH-related object file, the improvement is usually even better for incremental builds than full rebuilds. I've been using PCHs this way for slightly more than a year, and I forgot already quite some time ago how slow it used to be.
Tuesday, January 21, 2020
Skipping functions from entire directories while debugging (e.g. skip all functions from system headers)
skip -gfi /usr/include/*
skip -gfi /usr/include/*/*
skip -gfi /usr/include/*/*/*
skip -gfi /usr/include/*/*/*/*
I feel so stu^H^H^Hproud for catching up only 3 years late.
Thursday, November 28, 2019
All(?) the necessary info about how to enable it etc. are in this mail, but there are things that better fit a blog post than a mail, and in this case that's going to be a table and a picture showing how well it may perform. Note that these results are from running visualbackendtest, which is not really a benchmark, so these numbers should be taken with a grain of salt. It's just a test that draws a gradient, several big polygons (each circle is actually 720 lines) and short text.
|Linux gen (X11)||86|
|Linux Skia Vulkan (GPU)||65-90|
|Linux Skia raster (CPU)||5|
|Windows Skia Vulkan (GPU)||175-185|
|Windows Skia raster (CPU)||75-85|
Saturday, November 9, 2019
It turns out, the problems I mentioned last time had already been more or less solved in Clang. But only for C++ modules, not for precompiled headers. *sigh* I had really mixed feelings when I finally realized that. First of all, not knowing Clang internals that well, it took me quite a long time to get to this point figuring it all out, probably longer than it could have. Second, I've been using C++ modules when building Clang itself and while it's usable, I don't consider it ready (for example, sometimes it actually makes the build slower), not to mention that it's non-trivial to setup, not standardized yet and other compilers (AFAIK) do not yet support C++ modules. And finally, WTH has nobody else yet noticed and done the work for precompiled headers too? After all the trouble with finding out how the relevant Clang parts work, the necessary patches mostly border on being trivial. Which, on the other hand, is at least the good news.
And so I'm switching for LibreOffice building to my patched build of Clang. For the motivation, maybe let's start with an updated picture from the last time:
shown last time it makes things slower). In case you notice the orange 'DebugType' in the third row that looks like it should be in the second row too, it should be there, but that's one of these patches of mine that the openSUSE package does not have.
The third row is with one patch that does the PerformPendingInstantiations phase also already while building the PCH. The patch is pretty much a one-liner when not counting handling fallout from some Clang tests failing because of stuff getting slightly reordered because of this. Even by now I still don't understand why PCH generation had to delay this until every single compilation using the PCH. The commit introducing this had a commit message that didn't make much sense to me, the test it added works perfectly fine now. Presumably it's been fixed by the C++ modules work. Well, who cares, it apparently works.
The last row adds also Clang options -fmodules-codegen -fmodules-debuginfo. They do pretty much what I was trying to achieve with my original patch, they just approach the problem from a different side (and they also do not have the technical problems that made me abandon my approach ...). They normally work only for C++ modules, so that needed another patch, plus a patch fixing some problems. Since this makes Clang emit all kinds of stuff from the PCH into one specific object file in the hopes that all the compilations using the PCH will need that too but will be able to reuse the shared code instead, LibreOffice now also needs to link with --gc-sections, which throws away all the possibly problematic parts where Clang guessed wrong. But hey, it works. Even with ccache and Icecream (if you have the latest Icecream, that is, and don't mind that it "implements" PCHs for remote compilations by simply throwing the PCH away ... it still pays off).
So, that it's for a single compilation. How much does it help with building in practice? Time for more pretty colorful pictures:
Note that GCC(v9) and MSVC(v2017) are there more as a reference than a fair comparison. MSVC runs on a different OS and the build may be possibly slightly handicaped by some things taking longer in Cygwin/Windows. GCC comes from its openSUSE package, which AFAICT is built without LTO (unlike the Clang package, where it makes a noticeable difference).
And in case the graphs don't seem impressive enough, here's one for Library_sc, which with its 598 source files is too big for me to bother measuring it in all cases. This is the difference PCHs can make. That's 11:37 to 4:34, almost down to one third: