Thursday 30 November 2017

C++ : Trust the STL

One sore lesson to teach some developers is when to trust the compiler, once you've gotten that across you have to start teaching folks to stop re-inventing the wheel.

If someone has already implemented a file handler, or a serial port abstraction, or a wrapper for some obscure feature, you need to evaluate that offering...

To evaluate whether a library is worth using, firstly see if it works, then see how many folks actually use it, the more that use it then the more likely bugs will be flushed out and the whole thing has been tested.

Leveraging this kind of mature code within your projects assists in bootstrapping the startup phase of new projects.

Boost is a note worthy example of what I'm talking about here, many software shops (at least the ones I know) resist using open-source or third party libraries, they prefer to stick to in-house developed niche implementations until the very last moment, this of course slows development and completely symies innovation.

Boost however is one step further than the problem I'm going to tackle today... The Standard Template Library...

The STL is often commented upon negatively, this is despite it being a hugely available resource, vastly and deeply tested throughout and constantly incorporating new innovations.  Whole books have been written on the topic, and yet one can still find projects and individuals resisting using the STL.

STL nay-sayers will quote "no need for an STL requirement", "uses less memory than an STL implementation" or "faster than the STL"...

The problem with this attitude is, are such attitudes going to sufficiently tackle testing of their bespoke solution, is that bespoke solution going to be as robust or as easily maintained as something using the STL?

Probably not, and this is a hard one for die hard "purist" developers to swallow, we want to write all our own code, we want to be gods in our domain, the trouble is for the vast number of us, god has already been there and he wrote a decent enough library to do the task we need doing... So leverage this!

I came across one such niche item the other day, with an algorithm to see if a string starts with...

They hadn't used boost, or the STL, to do the searching, yet perversely had used an std::string... Their code, looked a little like this:

const bool StartsWith(
const std::string& p_Text,
const std::string& p_Pattern)
{
bool l_result(true);

if ( p_Text.length() >= p_Pattern.length() )
{
for (unsigned int i(0);
i < p_Pattern.length();
++i)
{
if ( p_Text[i] != p_Pattern[i] )
{
l_result = false;
break;
}
}
}
else
{
l_result = false;
}

return l_result;
}

It is fairly logical code, they're looking at the length of the presented parameters, to avoid looping when not required, then they only loop from the start and only return a fail when the character is a miss-match, looking at this with programming eyes from 1996, I'd say this is fine.

Looking with eyes well aware of the STL, I cringe a little, and replaced this whole function like this...

const bool StartsWith(
const std::string& p_Text,
const std::string& p_Pattern)
{
return (p_Text.find(p_Pattern) == 0);
}

One line, of very much more maintainable, vastly more readable and easy to comprehend code...

The developer of the original however was not happy... "you're wasting resources, this will find any instance and tell you the input".... he's right it will, but the STL will still be faster than his code.

I demonstrated this by plugging both into CompilerExplorer... He still refused to listen.

Therefore, I've written this little helper project, to run the two functions side by side, threaded three tests, looking for the match, a long match and a negative match at the start of the string (Code on Github).


The results of this are interesting, you see the project itself favours cases where it's highly likely the string being searched for is present and therefore we don't need to worry too much about the odd test not finding a match taking longer... This is exactly the behaviour seen in the STL based find example.


The Short search time, for the same data, on the same processor went from 28358 microseconds to just 5234... That's about 81% faster.  The longer search is more stark, falling from 185966 microseconds to just 6884, just over 96% faster!

The rub is the negative case took longer, rising from 19765 in the hand-crafted search to 25695, just over 30% slower.  Some of this increase can be explained perhaps by the hand-crafted version using the lengths to quickly skip too short an input, otherwise it is simply that the STL find has to iterate over the whole string when no match is found.  A hybrid to not perform the find at all, when there is insufficient data maybe in order; however this may add to our maintenance burden and lower code clarity, swings and roundabouts.

However, clearly in the case of this project, dismissing the STL resulted in slower code, we have a system propensity for matches, they're quite short, and all target platforms have the STL built in, use it.

Never be affraid to ask questions of what you're working with, ever.

No comments:

Post a Comment