Showing posts with label clang. Show all posts
Showing posts with label clang. Show all posts

Thursday, 25 May 2023

Just Stand It Up: About Premature Pessimization

Engineers often talk about premature optimization, but today I'm going to just talk briefly about the opposite, premature pessimization.

I currently work on a very large code base, it has been developed over four years from scratch.  One of the first things performed were a series of investigations into "best performing" data structures, such as maps, lists and so forth.

Now of course one has to totally accept one can optimize any data structure that little bit more for a specific use case.  One also accepts that when in C++ the standard library lends itself to being replaced bu defining standard operators, iterators and the algorithms going with all this use those standard exposed APIs, so you can implement your own.

I just want you to stop though and think... Do you want to?

Too early in a project and you can start to introduce bloat, either in terms of slight differences in the optimized cases, the code from whatever third party "best" you picked and even from your build chain, as you are bringing in dependencies on someone else.

The standard library doesn't do any of this, its dependencies are guaranteed by the ABI.

So why not just use standard map, or standard vector and standard string, or standard formatting?

Quite often I'm finding it is due to premature pessimization, that developers voice who cries out about some issue they had, either when some technology was new and emerging or late in an earlier project's life where they had to optimize for that specific case I mention, where the standard version did prove itself to be a detriment.

These engineers carry with them the experience, sometimes scars, from such exposure to edge cases and bugs they had to quash.  Rightly and understandably they do not want to experience these self same issues again; their minds therefore are almost averted from just standing it up with the standard version.  They immediately seek and even proactively nay-say the standard versions in favour of domain specific "best" versions.

This is in my opinion the very definition of premature pessimization, the standard library is wonderful, diverse and full of very well tested code and will have nearly zero overhead in adding and using to your project in C++.

I would therefore coach any developer with such mental anguish over just using the standard library to simply stand it up, just get across that line of things both building and running, then extend it to remain maintainable.  And finally as you think you're getting close to stable, well then you can expend more time looking at, profiling, and understanding the edge cases.

Monday, 12 October 2020

That 32 Core build....

 I've just been sorting out the 32 core server and setting it up to do me a test build.  I chose to build the llvm-project, with clang, clang-cl, libcxx, libcxxabi, libunwind, lldb, compiler-rt and lld enabled.  As a release only....

Anyway, I had to share this awesome screenshot of the build progress window, and then the inset window of a second terminal session showing htop...



Sunday, 20 January 2019

C++: Undefined behaviour from realloc and the Clang Optimizer

I was wondering around in some system code and found a strange behaviour between two modes in a piece of code, where the code switches from double to triple buffer mode there's a problem, we've already got two buffers of everything we just want to allocate another but the underlying structure for sets of buffers wants to own them all... So the code went from:

SetOfBuffers
{
Buffer* one;
Buffer* two;
}

To:

SetOfBuffers
{
Buffer* one;
Buffer* two;
Buffer* three;
}

Creating the third buffer is fine:

SetOfBuffers::three = malloc(X);

But, the first two is a re-alloc, to reuse the existing buffer:

SetOfBuffers::One = realloc(OldSet::one, X);
SetOfBuffers::Two = realloc(OldSet::two, X);

The problem?  I'd start to modify the values in the new set of buffers, the third buffer being used and present.  Then the first buffer would be changed present... The second buffer changed present and the information is wrong (I over simplify massively here).

Anyway, I was remotely SSH'd into my server for this, so I went to Visual Studio, same code... Worked fine... So I go into my local VM and it's fine too, so I went back to the server and compiled manually and suddenly it's fine too.... WTF.

I literally spent an hour looking at this, the problem?  Well, it appears to be a bug in Clang, the reason the problem disappeared was my Makefile contains a $CC constant for the compiler to use and it was "clang" when I built by hand I used "g++".  Worse still, if I switched to a clang debug build the code worked fine, so this was something about my compilation process not a bug in the code per se.

So, perplexed I went in search of an answer.  And it appeared to be something about the clang optimizer, about which I found this talk from CppCon 2016.

Where there's this example:

#include <cstdlib>
#include <cstdio>
int main ()
{
int* p = (int*)malloc(4); // The original buffer above
int* q = (int*)realloc(p, 4); // The new pointer to the same old buffer
// Allocate a vlaue
*p = 1;
*p = 2;
if ( p == q )
{
printf("%d %d\n", *p, *q);
}
}

What do you expect this code to display?... Well, I expect it to print "2 2".  And it does on VC and G++ and even clang without the optimizer...





But you optimize the compile and its wrong:


Now, this is undefined behaviour and not caused by your code, it's the optimizer and very scary.  Not least as this was identified a while back (the talk along is from 2016) and g++ has solved the problem... Eeeek.