The Megalomaniac Bore: Coding Standards

Showing posts with label Coding Standards. Show all posts

Thursday, 31 January 2019

C++ Coding Standards: Don't Return Early

I've just had one of those mini-conversations which has be scratching my head, I asked a coworker about "return early" being part of the coding standard or not, as it was being discussed by other parties.

His response was an emphatic, slightly arrogant and touchily obtuse, "I have no time for anyone arguing for not returning early from a function you've discovered you should not be in".

I however don't take such a black and white view, I see the pro's and con's of both approaches, importantly not returning from a function can, and is part of, driving process flow conversations and it aids in picking out structure which can be encapsulated away. The result being that instead of instantly exiting and potentially leaving spaghetti code you can see a flow, see where branching is occurring and deal with it...

But, that said, it must be understood "returning early" is often seen as "faster", as the chap argued "I see no point being in a function you know you should already have left", so I took to compiler explorer... In a very trivial example:

Here we see a decision being made and either a return or the active code, easy to read, simple, trivial... But don't dismiss it too early, is it the neatest code?

Well, we could go for less lines of code which generates near identical assembly thus:

This is certainly less lines of C++ however, inverting the test is a little messy and could easily be overlooked in our code or in review, better might therefore be expanding the test to a true opposite:

One could argue the code is a little neater to read without the inverting, critically though it has made no difference to the assembly output, it's identical.

And it is identical in all three cases.

You could argue therefore that "returning early has no bearing on the functionality of the code", but that's too simplistic, because "not returning early" also has no bearing on the functionality of the code. The test and action and operation has been worked out by the magic of the compiler for us to be the same.

So with equivalent operation and generation we need think about what returning from the function early did affect, well it made the code on the left longer, yes this is an affectation of our coding with braces, but it was longer. You could also see that there were two things to read and take in, the test was separate to the code and importantly for me the test and the actual functional code were on the same indentation horizontal alignment. Being on that same alignment makes your eye not think a decision has been made.

Putting the test and the action of that test into the inner bracing communicates clearly that a decision has been made and our code has branched.

And when that structure is apparent you can think about what not returning early has done, well it's literally braced around the stanza of code to run after the decision, that packet of code could be spun off into it's own function do you start to think about encapsulation. Of course you can think about the same thing after the return, but in my opinion having the content of the braces to work within is of benefit and most certainly does not afford any speed benefits.

Lets look at a more convoluted, but no less trivial example, a function to load a file in its entirety and return it as a known sized buffer... we'll start with a few common types:

#include <memory>

#include <cstdint>

#include <string>

#include <cstdio>

#include <optional>

#include <sys/stat.h>

#include <sys/types.h>

using byte = unsigned char;

using DataBlock = std::shared_ptr<byte>;

using Buffer = std::pair<uint32_t, DataBlock>;

using LoadResult = std::optional<Buffer>;

And we'll assume there's a function to get the file size, with stat...

const uint32_t GetFileSize(const std::string& p_Path)

{

uint32_t l_result(0);

struct stat l_FileStat;

if ( stat(p_Path.c_str(), &l_FileStat) == 0)

{

l_result = l_FileStat.st_size;

}

return l_result;

}

Now, this file is return path safe because I define the result as zero and always get to that return, I could have written it thus:

const uint32_t GetFileSizeOther(const std::string& p_Path)

{

struct stat l_FileStat;

if ( stat(p_Path.c_str(), &l_FileStat) != 0)

{

return 0;

}

return l_FileStat.st_size;

}

But I don't see the benefit, both returns generate an lvalue which is returned, except in the latter you have two code points to define the value, if anything I would argue you loose the ability to debug "l_result" from the first version, you can debug it before, during and after operating upon it... Where as the latter, you don't know the return value until the return, which results in the allocate and return. And again in both cases the assembly produced is identical.

So, the load function, how could it be written with returning as soon as you see a problem?... Well, how about this?

LoadResult LoadFileReturning(const std::string& p_Path)

{

LoadResult l_result;

if ( p_Path.empty() )

{

return l_result;

}

auto l_FileSize(GetFileSize(p_Path));

if ( l_FileSize == 0 )

{

return l_result;

}

FILE* l_file (fopen(p_Path.c_str(), "r+"));

if ( !l_file )

{

return l_result;

}

Buffer l_temp { l_FileSize, std::make_shared<byte>(l_FileSize) };

if ( !l_temp.second )

{

fclose(l_file);

return l_result;

}

auto l_ReadBytes(

fread(

l_temp.second.get(),

l_FileSize,

l_file));

if ( l_ReadBytes != l_FileSize )

{

fclose(l_file);

return l_result;

}

l_result.emplace(l_temp);

fclose(l_file);

return l_result;

}

We have six, count them (orange highlights) different places where we return the result, as we test the parameters, check the file size and then open and read the file all before we get to the meat of the function which is to setup the return content after a successful read. We have three points where we must remember and maintain to close the file upon a problem (red highlights). This duplication of effort and dispersal of what could be critical operations (like remembering to close a file) throughout your code flow is a problem down the line.

I very much know I've forgotten and missed things like this, reducing the possible points of failure for your code is important and my personal preference to not return from a function early is one such methodology.

Besides the duplicated points of failure I also found the code to not be communicating well, its 44 lines of code, better communication comes from code thus:

LoadResult LoadFile(const std::string& p_Path)

{

LoadResult l_result;

if ( !p_Path.empty() )

{

auto l_FileSize(GetFileSize(p_Path));

if ( l_FileSize > 0 )

{

FILE* l_file (fopen(p_Path.c_str(), "r+"));

if ( l_file )

{

Buffer l_temp { l_FileSize, std::make_shared<byte>(l_FileSize) };

if ( l_temp.second )

{

auto l_ReadBytes(

fread(

l_temp.second.get(),

l_FileSize,

l_file));

if ( l_ReadBytes == l_FileSize )

{

l_result.emplace(l_temp);

}

fclose(l_file);

}

return l_result;

}

This time we have 33 lines of code, we can see the stanzas of code indenting into the functionality, at each point we have the same decisions taking us back out to always return. When we've had a successful open of the file we have one (for all following decisions) place where it's closed and ultimately we can identify the success conditions easily.

I've heard this described as forward positive code, you look for success and always cope with failure, whilst the former is only coping with failure as it appears.

I prefer the latter, ultimately it comes down to a personal choice, some folks argue indenting like this is bad, I've yet to hear why if the compiled object code is the same, you are communicating so much more fluently and have less points of possible human error in the code to deal with and maintain.

From the latter we could pick out reused code and we can target logging or performance metrics more directly on specific stanzas within their local scopes. Instead of everything being against the left hand indent line.

Now, I will happily tell you it hasn't been a comfortable ride to my thoughts on this topic, I started programming on a tiny screen (40x20 characters - Atari ST Low Res GEM and when I moved to DOS having 80 character widths I felt spoiled. Now we have tracts of huge screen space, arguing you need to stay on the left is moot, you don't, use your screen, make your code communicate its meaning, use the indents.

And yes, before you ask, I was doing things this way long before I started to use Python.

Monday, 15 October 2018

C++: To Reference or Not

Constant Something Reference or Something constant reference... I ask myself this in a similar manner as Prince Hamlet at the start of the nunnery scene... For as a C++ programmer and English speaker I've always found myself most comfortable using the phraseology thus:

const int X (0);

And to pass this to a function:

void foo (const int& p_X);

I find this most useful, we're passing a constant integer reference to the function... However, I was recently challenged that this meant something else to a reader, her input was that it meant "constant integer" reference, that is we would ONLY be able to pass "const int" instances, not "int" instances. The thinking being:

const int X(0);

void foo(const int& p_X);

foo(X);

Would compile, whilst:

int Y(42)

void bar (const int& p_Y);

bar (Y);

Would fail to compile or at least spit out a warning because "int" was not able to be passed into the function as "constant integer" and "integer" to this reader were different types.

They're not really of course, constant is a decorator (and one which we can remove with const_cast) the aim of using "const int reference" as the type is not all of our purpose in declaring the function like so, the purpose is to communicate what we do with the value going into the function.

It is 100% not about stopping Y being passable into the function as above.

No, we want to tell any user of our code that within the functions "foo" and "bar" we do not change the value, if they pass Y to us with 42, when the function is complete Y will stil contain 42. If the value could potentially change we would not be able to use "const" that's the purpose of passing const in this case.

Passing by reference is just to save us the time, effort and delay in allocating memory and taking a copy of the parameter before moving into the body of the function, so:

void bar (const int p_Y)

Would be the same operation as above, we tell the user we don't change the value of the parameter, but we do because we take a copy of the value being passed in and operate upon it.

The communication we get with this is very useful.

But of course, if we're using threaded programming and we pass a reference to a value at time point A, then sometime later elsewhere edit the referenced value, we may run into unexpected behaviour, so there is an argument sometimes to take a copy at the point of instantiating the child function, in the most part however, passing by reference is considered the norm.

Saturday, 21 July 2018

C++ : Coding Standards by Example #1 (with Boost Beast)

Today I've spent sometime doing a little different video, rather than just play about in code, whilst I played about I made a video of the time I had... Here is it...

We cover:

C++
Coding Standards
Structured Programming
Functional Programming
Encapsulation
Object Orientated Programming
Building the Boost Libraries (1.67.0)
Visual Studio 2017
Static Linking

And generally spend time seeing how I turn gobble-de-gook code into something usable and maintainable against my own Coding Standards.

Get the code yourself here: https://github.com/Xelous/boostBeast

Wednesday, 2 May 2018

The Best and the Worst : Working with Genius Programmers

A long time ago, in an office far away from where I now sit, I once worked with a chap I still refer to as the best programmer I've ever met.

This was a guy who could take the whole code base, in Delphi, home and over a single weekend re-write it in Java.

This was a guy who I saw, from scratch, write a C controller for an embedded PIC to capture an image from a supposedly incompatible TTL driven camera and then an analyzer for the captured images which would detect and show motion, making for our common employer their best ever selling product a cheap security motion detection system, which didn't rely on relatively expensive high resolution cameras.

It was awe inspiring as a newly graduated programmer, whom had a huge background in DOS programming, but whom had never worked in Enterprise level development before.

I sat next to what I still regard as near genius.

This very same chap was also the worst programmer I've ever worked with.

Because he was so highly functioning he never needed to document his code, fine I hear you cry, good code should be self documenting; and you're absolutely right; the problem? This guy also got bored so so quickly, so he used to tell himself stories in his code.

Yes, Robert Jordon eat your heart out, this guy wrote epic fantasy on a grand scale, across hundreds of thousands of lines of code, in Delphi, C, C++, Java and even in HTML which I saw him churn out, it was all a gobbledygook puddle of story telling rambling mess.

But the code worked, the managers didn't care that it was gibberish; at least not at first; because they could churn out product to the anticipating masses of customers.

Such a prolific talent, he had so many fingers in so many pies, he was invaluable, key man, the man, the one person every project started with.

The result? Every single code base he touched was tainted with this un-maintainable morass of code. Which an ever increasing march of cheap graduate programmers, like my then self, had to then decipher, maintain and coral.

Often the time it took to bring a project into some semblance of order would be three or four times more than it took that one original chap to write, this did not go unnoticed and managers rightly pointed their fingers to ask the question "How could you not keep up?"

I however was the first such junior person with a voice, I've always had a voice, and I pointed right back "How could you let us get into this mess?"

I dated to question, sweep, and change the code, I dared to spend time even just aligning the code correctly. No JetBrains formatting (or resharping) tools, very few tools existed to cover the whole pantheon of mess we were now wrestling to stay a head of.

Daring to question, change, read and challenge the talented one resulted in his changing his ways, he returned to some of the projects I had lead re-working, he saw the structure and the discipline within he saw that you could quickly pick up and get to work without needing to load all the software into ones wetware in a laborious re-read.

This skill, this willingness, to press the boundaries is somewhere I've oft and continue to take projects, and I do ask those I throw code at to feedback to me where they think anything needs reviewing.

I deplore any project or maintainer whom takes the grounding that they must keep things secret and keep things safe.

Drop, the epic fantasy, you're not Gollem, share, review and open the boundaries.

Monday, 26 March 2018

C++ : Pass-By-Reference Or Die

Before today's Post, I'm on a mission folks, to get 1000 subs on YouTube. If only 5% of viewers here subscribed we've have met this target in one month...

I've just had group code review of one of my personal projects, and been rather surprised by the vitriol levelled at one of my practices.... Pass by Reference.

The reviewer, one of a group of peers, has had major issues with the project (my personal) insistance on passing by reference wherever possible, in C++ this takes the form of an additional ampersand on parameter definitions; maybe this was the chaps problem, he has to type an ampersand?

So his problem? Well, without the actual code we'll simplify and use the Compiler Explorer (from Godbolt.org) and we'll take up their basic square function example, it starts up thus:

Giving the assembler:

On the right, and this chap had taken time to prepare a whole slide show of functions, usually simple, and present them at this code review, showing this kind of thing. His point... Well the very same C++ but with a pass by reference:

Turns up more lines of assembler:

He's got me right, right, I'm taking more time, I'm slowing everything down, by my not taking a copy of everything and using less memory I'm slowing things down....

This is where the sort of power play turned, I allowed him to present everything, I never interjected, never spoke, I allowed him to speak to the whole group. We've hired a venue for this, we're meeting live for the first time. This has to be good.... A couple of the chaps who can already see the fault in the complainers logic were smirking, but we let him finish.

Triumphant, he has won the day, he will not carry the torch of coding standard gods... WRONG.

I pulled over the presentation laptop, opened godbolt.org myself... Added the ampersand to the "num" and let it produce the above assembler... The chap was smirking completely from ear to ear, he knew he had me...

And then I typed three characters....

-O2

Yes, I told the compiler to optimize, and this happened...

Remarkably small code wouldn't you say? I still haven't spoken, but I turn the laptop back to the presenter and just sit there.

There's a noticable snigger from those in the know, older-wiser heads then my own I hasten to add. But this young chap is now looking from me to the screen to the overhead projection and back with a mix of fury and completely puzzlement, he'd checked everything, he's dotted every j and crossed every t, he had me down pat, he wanted to usurp me.

Except, he's never ever, been willing to listen, to learn or to experiment, "code runs, that'll do" is very much his style (and Kyle if you're reading, yes I'm talking about you) but getting code to run is not enough, understanding the code you've written is often only just enough, but getting it to run everywhere, the same way, that's an art. Debug, Release, Optimised, Unoptimised, automatically profiled, link database and continune they're all subtly different. Just listing one thing out, the only thing you've looked at; because it backs up your point of view; is not enough you have to look around and see the holistic picture.

And optimised without a pass-by-reference?

Spookily similar code in this case, but often times pass-by-reference is prefered, using const-correctness is prefered it communicates a meaning.

For instance in the "square" function above, how does the caller know that the parameter "num" is not altered in value? How does the caller know it returns the new value only? It could be returning an error status code and the parameter altered in value! You don't know, but making the parameter const and a reference you start to communicate more firmly the intent of your code.

Monday, 22 January 2018

C++ Simple Limits & Server Recase

Health & Myself

What a whirlwind few weeks I've had, first of all, I've had some minor medical work done so not really been in the mood to do anything very interesting except recuperate in my spare time. I've also got on-going dental treatment going on, so I'm doubly aggravated.

Next I have been so busy with work, I've had a change of job title, I've been effectively promoted into the lead developer role for the platform/product range I work on; this is sort of a defacto position, as I'm the ONLY developer working in this area at present, however I'm now officially the man on point and am about to reform a new development team around it. I will be working in Scrum methodology with this new team, we'll be pushing product to both Windows and Linux, Git will be our source control system (no matter what anyone else says) and I'm going to leverage C++ as the core product with tools all being produced in Python, very exciting stuff.

Server Issues

At home however, you may have seen this late night problem....

I have not, as yet, tackled this problem in any satisfactory manner; there's just been too much going on.

However.... I have a plan, a plan so cunning you could put a tail on it and call it a fox.

I had a look around in the case, and removed the bad drive, here it is....

This is an old drive first of all, and it was only 3.0gb/s SATA, but it was useful as part of my mirror, and without it I'm now naked. In my room/workspace tidy rambling video you will see I have a stack of WD blue drives on my desk, however I'm at pains to put them into this server, as the more space I give myself then the more data I have to move later, and the less places I have to stash that data.

What do I mean?... Well, that machine already has about 4TB of data in the two ZFS Pools and the one scratch drive, I do not have ANYWHERE else to put 4TB of data, and I can't afford a new drive, hence why I've carefully and strategically informed the wife I'm buying the off 1TB drive once in a while over the last year... Dropping a 4TB drive in the mix, she'll flip.

Recasing my Server & Workstation

So, my plan?

Well, first of all I have my main work station in a trusty, but now rather old, Coolmaster Cosmos 1000 case (you can see this case in some of the very early pages of this very blog)... Now, this is a very nice case, and has 10 internal 3.5" drive bays... So would be pretty nice for my server to move into, add all the new WD drives to it and voila a working server with expansion capabilities.... A recasing, that's a plan... Plus I get to perhaps look at that Xeon mod a year on and evaluate how it got along.... And of course more ZFS fun... Maybe a video about Git & setting up my own ticket handling system.

Right, that's the server... What about the now case-less workstation?

Well, I'm considering a new case, of course. Front runners are the Corsair Obsidian 750D as it has modular upgrade ability, and if I like it, I can get another and expand the server into it next year.

Infiniband

A long-term project is to consider an Infiniband link between my server and my workstation, we'll see how this goes along, not only would I need to purchase some adapters and cables, but I would need to switch my workstation into a new dual-boot as only Linux would be on Infiniband (and as you may have noticed Windows was on the list of platforms for my new role specification).

C++

Simple limits in C++?.. What am I talking about, hmmm, well I just ran into some code:

auto value (0);

if ( requisitValue < defaultValue )

{

value = requisitValue;

}

else

{

value = defaultValue;

}

Nothing wrong with this you may argue, however the developer hadn't looked at the values they were working with, the defaultValue for this system locale was ALWAYS the highest value on a sliding scale, the configuration insisted the default be the cap and so any requisite value would be smaller.

I asked the chap to have a think about this code, having explained the requisit would always be lower than the default, and asked him to come up with any improvements. He did....

auto value(

(requisitValue < defaultValue) ? requisitValue : defaultValue);

I asked him to rationalise this change, why had he done it?

To meet the coding standard define of not assigning, but to initialise with a value
To make the code less lines of code

I have no disagreement with the first point, the second... Not so much....

I do not subscribe to needing code to be smaller (less lines of code) so long as it's readable, as for my work maintainability far out reaches using less lines of code. Especially when the change made is exactly the the same resulting code, just the change is far harder to read.

(Don't miss-read this, I agree you should try to make functional code take less lines of code, but the above is white-space more than functional change, white-space on a modern compiler is NOT a problem, make your code readable before you start trying obfuscating functionality for fancy frills).

I stopped the developer in his tracks however, and asked him to pull up the document for the specification and describe "defaultValue"... "The maximum amount allowed in the value" he read.

So, changing the name of this value was the first step it became "MaximumValue". Use a meaningful name.

And finally, if you really want this decision to be a single line of code, how about:

auto value(

std::min<valueType>(

requisiteValue,

maximumValue));

Using the minimum function here, we make it clear we're doing a mathematical operation, it's a single line instruction (even if I do stagger it for readability over four), and we're not going to confusing the order of our parameters, so avoid introducing comparator ordering bugs later.

This is the simplest way to find a limit between two variables of the same type in C++, using std::min and std::max. Learn about them and love them.

The reply the young man gave me... "But changing the name'll mean I have to change hundreds of different places in the code!".... "Yes, so use find & replace in files?".... "Oh but what about all the other "defaultValues?".... "Why are they not in namespaces, or classes, or using other names?"... Silence.

Friday, 5 January 2018

Bad Code...

You know when you open someones code and you find this inside a class function...

unsigned short currentMessageOffset = this->CurrentMessageOffset;

That you're in for a rough ride.

First of all, WHY assign the member value on the right to a local? No-where else in the class is this value edited or read, so there's no need to assign it to anything.

And then the naming, this naming of something local with the same name, and it is the same name, despite the capitol leading letter, it's the same name, just makes this utterly useless.

All this before even mentioning that the code is an assignment not an allocation, so annoying.

Thursday, 30 November 2017

C++ : Trust the STL

One sore lesson to teach some developers is when to trust the compiler, once you've gotten that across you have to start teaching folks to stop re-inventing the wheel.

If someone has already implemented a file handler, or a serial port abstraction, or a wrapper for some obscure feature, you need to evaluate that offering...

To evaluate whether a library is worth using, firstly see if it works, then see how many folks actually use it, the more that use it then the more likely bugs will be flushed out and the whole thing has been tested.

Leveraging this kind of mature code within your projects assists in bootstrapping the startup phase of new projects.

Boost is a note worthy example of what I'm talking about here, many software shops (at least the ones I know) resist using open-source or third party libraries, they prefer to stick to in-house developed niche implementations until the very last moment, this of course slows development and completely symies innovation.

Boost however is one step further than the problem I'm going to tackle today... The Standard Template Library...

The STL is often commented upon negatively, this is despite it being a hugely available resource, vastly and deeply tested throughout and constantly incorporating new innovations. Whole books have been written on the topic, and yet one can still find projects and individuals resisting using the STL.

STL nay-sayers will quote "no need for an STL requirement", "uses less memory than an STL implementation" or "faster than the STL"...

The problem with this attitude is, are such attitudes going to sufficiently tackle testing of their bespoke solution, is that bespoke solution going to be as robust or as easily maintained as something using the STL?

Probably not, and this is a hard one for die hard "purist" developers to swallow, we want to write all our own code, we want to be gods in our domain, the trouble is for the vast number of us, god has already been there and he wrote a decent enough library to do the task we need doing... So leverage this!

I came across one such niche item the other day, with an algorithm to see if a string starts with...

They hadn't used boost, or the STL, to do the searching, yet perversely had used an std::string... Their code, looked a little like this:

const bool StartsWith(

const std::string& p_Text,

const std::string& p_Pattern)

{

bool l_result(true);

if ( p_Text.length() >= p_Pattern.length() )

{

for (unsigned int i(0);

i < p_Pattern.length();

++i)

{

if ( p_Text[i] != p_Pattern[i] )

{

l_result = false;

break;

}

else

{

l_result = false;

}

return l_result;

}

It is fairly logical code, they're looking at the length of the presented parameters, to avoid looping when not required, then they only loop from the start and only return a fail when the character is a miss-match, looking at this with programming eyes from 1996, I'd say this is fine.

Looking with eyes well aware of the STL, I cringe a little, and replaced this whole function like this...

const bool StartsWith(

const std::string& p_Text,

const std::string& p_Pattern)

{

return (p_Text.find(p_Pattern) == 0);

}

One line, of very much more maintainable, vastly more readable and easy to comprehend code...

The developer of the original however was not happy... "you're wasting resources, this will find any instance and tell you the input".... he's right it will, but the STL will still be faster than his code.

I demonstrated this by plugging both into CompilerExplorer... He still refused to listen.

Therefore, I've written this little helper project, to run the two functions side by side, threaded three tests, looking for the match, a long match and a negative match at the start of the string (Code on Github).

The results of this are interesting, you see the project itself favours cases where it's highly likely the string being searched for is present and therefore we don't need to worry too much about the odd test not finding a match taking longer... This is exactly the behaviour seen in the STL based find example.

The Short search time, for the same data, on the same processor went from 28358 microseconds to just 5234... That's about 81% faster. The longer search is more stark, falling from 185966 microseconds to just 6884, just over 96% faster!

The rub is the negative case took longer, rising from 19765 in the hand-crafted search to 25695, just over 30% slower. Some of this increase can be explained perhaps by the hand-crafted version using the lengths to quickly skip too short an input, otherwise it is simply that the STL find has to iterate over the whole string when no match is found. A hybrid to not perform the find at all, when there is insufficient data maybe in order; however this may add to our maintenance burden and lower code clarity, swings and roundabouts.

However, clearly in the case of this project, dismissing the STL resulted in slower code, we have a system propensity for matches, they're quite short, and all target platforms have the STL built in, use it.

Never be affraid to ask questions of what you're working with, ever.