Wednesday 27 December 2017

C/C++ Stop Miss using inline.... PLEASE!

This is a plea, from the bottom of my rotten black heart, please... Please... PLEASE stop miss using the inline directive in your C and C++.

Now, I can't blame you for this, I remember back in the 90's being actually taught (at degree level) "use inline to make a function faster", and this old lie still bites today.

inline does not make your function faster, it simply forces the compiler to insert "inline" another copy of the same code whenever you call it, so this code:

#include <iostream>

inline void Hello()

{
    std::cout << "Hello";

}

int main ()
{
    Hello();
    Hello();
    Hello();
}

Turns into the effective output code of:

int main ()

{
    std::cout << "Hello";

    std::cout << "Hello";
    std::cout << "Hello";
}

What does this mean in practice?  Well, you saves yourself a JMP into the function, and the position on the stack holding the return address, and the RET from the function which pops off the stack and returns from the function.

This is WHY people were told to use inline to make things faster in the 90's, I was taught this when I a system with around 254K of working RAM for the programs I was writing, saving that space on an 8K stack was important in complex systems, especially if you were nesting loops of calls.

However, today, on a modern processor, even modern embedded processors, DO NOT DO THIS!

You're no longer saving anything, you're in fact making your code bigger and slower as suddenly your program expands in size and you are having to fetch more and more from the slower RAM layers rather than the program instructions page fitting into the lower CACHE layers.

As you get page misses you fetch more, you literally stop the program and switch context to another item and then switch back, literally halting your program in its tracks as it suddenly had to go load the N'th of possibly thousands of repeated stanza's of code.

Don't do, this, don't lumbar yourself, let the compiler handle it's own optimizations, they're pretty good at it!

Now some of you will be saying "yeah, no shit Xel, what's your point?"... My point is I recently had around 4000 lines of code handed to me, a huge long listing, and around 40% of it was a series of functions.  This whole thing could compile down to around 62K.... But when compiled it was just over 113K... This was too big to fit into the memory of the micro-controller it was for.

The developer had been working merrily over the yule tide, happy and satisfied their code would work, they went to work this morning and instead of running the code on the IDE within an emulator, they actually ran it on the metal.

It crashed, and they couldn't figure out why, the size was why.

And then they couldn't work out why the code was so big... It is tiny code.

They came, cap in hand, to myself - and I took no small satisfaction in rolling my eyes and telling them to remove the "inline" from EVERY function... "But it'll run so slowly" they decried... "REMOVE THE INLINE".

Of course it works, they have the system fitting into the micro-controller RAM, the stack is working a lot harder, their code is a lot smaller, and they are now in possession of a more balanced opine on "inline".

* EDIT *

One person, yes hello Hank, asked me "why", why was this a not a problem on the emulator, but was a problem on the bare metal, well the bare metal was using a different compiler than the pseudo compiler for the windows based IDE, the Windows based IDE was actually running the code through a compiler which ignored "inline", and so produced code a little like this:

(Image Courtesy "CompilerExplorer")

You can see that even though "int square(int)" is marked "inline" it contains the push to the stack and the "pop ret" pairing, and making it a call from main results in two function calls to the same assembler.

The bare metal compiler did not, an undocumented difference I might add.

No comments:

Post a Comment