Showing posts with label devices. Show all posts
Showing posts with label devices. Show all posts

Saturday, 24 April 2021

Bad Files and Smart Cards in a Project from Long Ago

I need to anonymize this code, so we'll be doing it in a pseudo C# style.  So one of the last tasks I had at my prior employer was to inherit the entire code base for a project I had been bitting and bobbing in for years, I'd seen this project start, release (many times), mutate and ultimately age.

As I took control it needed replacing, which is a whole other story involving C++ and dragging people kicking and screaming into touch.

This product though was like your grandad, it sat quietly on its own sucking a Worther's original waiting for a war film or Columbo to come on the tele.

The difficulty was the fault rate, between 9 and 14%, of machines were off in the morning, if a pack of updates were ever sent (for content) then that was around 46%... Image the calls there, the service manager and his oppo having to field 46% fault rate because of your update.  Indeed on one occasion I remember driving to a customers site and physically handing them a good update DVD rather than our leaving them to wait.

So what was so bad?  Well, it all came down to.... Lets look at a piece of code that is seared in my memory:

FileStream file = new FileStream("C:\\SomeFile.txt", FileMode.Open, FileAccess.Read, FileShare.None);
byte[] buffer = new byte[file.Length];
int bytesRead = file.Read(buffer, 0, (int)file.Length);
file.Close();
            // Do something with buffer to give us a new buffer
            int newDataLength = 64;
            byte[] newBuffer = new byte[buffer.Length + newDataLength];

file = new FileStream("C:\\SomeFile.txt", FileMode.OpenOrCreate, FileAccess.Write, FileShare.None);
file.Write(newBuffer, 0, newBuffer.Length);
file.Close();

This is part of an update sequence, where the existing file would be opened, the new update delta calculated and it intended to append it onto the end of the file, and this was fine for years, it worked, it got shipped.  It went wrong about five years later, can you see how maybe?

A hint is that this was a 32bit machine.

Did you spot it?.... it's line 2...

"file.Length" returns a long, but then all the following file operations work on int.  The file started to go wrong after it was two gigabytes in size, because the range of int being 2,147,483,647 if we divide by 1024 three times we get kilobytes, then megabytes, then gigabytes and we see this is roughly 1.99 gigabytes.

But then think about that, this is a 2GIGABYTE file being opened in a buffer in RAM!?!?!?

It just makes a pure RAM copy of itself, then opens the file and starts to write over the original from zero to the end.

YEAH, so it's over writing the whole original file.


It's so wrong in so many ways, the massive buffer, the overwriting of existing data already safe on disk, the fact that this all took time too... this operation happened at a reconcile phase, it was all asynchronous, whilst this system portion was doing this mental tossing about another part of the system had changed the screen... to say "Please Power off or Reboot".


So people did, they literally pulled the power.  So they lost their 2gigabytes+ of data, and when these files were getting large they were nuking them by pulling power too!

The solution is simple, open the file for append, or just seek to the end and add the new data on.

int newDataLength = 64;
byte[] buffer = new byte[newDataLength];
// Get the new data into the buffer
FileStream file = new FileStream("C:\\SomeFile.txt", FileMode.OpenOrCreate, FileAccess.Write, FileShare.None);
file.Seek(file.Length, SeekOrigin.Begin);
file.Write(buffer, 0, buffer.Length);
file.Close();

This was only part of the problem, the functions using the data from this file took it as a whole byte array, it literally had no way to chunk the file.  I can't go into the details, but I had to break that up and start to stream the data through that system which then let me add the resulting new delta array (which was always smaller than 2MB) to the end of the file.

That was only one part of the system which kept be awake, another good one, used a lot was a pattern to also overwrite small files, mostly the json files which controlled the settings.  So the users would often turn these machines off by simply pulling the power out of the back.

Whenever it was saving a file it would basically be doing:

File.WriteAllBytes(thePath, allTheBytes).

Yep, it'd just write over the file.

My fix?  Simple, when opening the file at a time when we didn't expect the users to just pull the power - or at least it being less common - make a file back up "File.Copy(source, dest)" and these destination files were numbered 1, 2, 3 which we could configure... so sites where we knew they had a high fault rate we could stack up 5 or 7 backups of these files.  but machines with a better hardware, or SSD's we'd only need 3.

I don't even think the service manager knew about this "fix".

But armed with these backups we could then leave the original code alone (which was quite convoluted and I didn't want to fix to be honest) but then on next load if the opening failed I'd have it nuke the back up it just took, then use the last best aged backup.  And if now there were more backups then we should have we'd delete the oldest.

Settings didn't change very often, but this did let us solve this issue.

The final worst piece of this system was the licensing system, which used a USB connected smart card reader, and a custom decrementing secure card format to license the machine time.  This was fine for years, it used a nice Gemalto reader and cards, and all was fine in testing.

The machine tested the card whilst in operation once every five minutes, so no big deal.  When in service mode it checked the card every 10 seconds to update the license level display, but the service mode was never intended to be left for more than a few minutes.... So what happened?

Yeah, a customer opened a machine and left it open for a week.... And their machine went out of operation, when we got this particular machine back I just opened the door, took the card out and pointed to the literally charred burned back of the smart card chip... It was a white plastic card, and the back was deformed and light brown... I did chuckle, sucked for the customer, but we never worked out why they had the door open in service mode for so long; they weren't meant to.

But worse that that isolated incidence was a new tranche of machines being released in 2015, suddenly all had faults, there was machines out of order, machines not allowing play, machines rebooting... Nothing seemed to clear them, and some were reporting "Out of Licensing"; despite people having paid for brand new cards.

They were issued a new card... The old cards came back, were reworked... so randomly once working sites got either a new card or a reconditioned card from any other random site.

New machines had a new brand of card reader, old machines had the Gemalto.  New cards were all these new brand of card, and the old cards were the white gemalto ones... this mix just went on... and soon we had a rising fault rate.

The diagnostic view was at first a little mixed, sometimes a new reader was fine, sometimes a new reader was bad... all customers reported "my new card", they had no idea that the brand had changed under the hood... and in fact nor did I.

You see to save a few pence per card (12p per card to be precise) they hadn't gone with the grand 34p GemAlto cards, they'd gone with 22p Chinese copies... Inferior copies as it turned out, they had around 1/8th the life span, so over time ALL these "new" cards failed.

But then, in the GemAlto reader they were all fine... So the new reader?... Oh that was ALSO a cheap Chinese knock off, and these things had strange problems, I suspected sometimes they were putting the full 5V USB current through the cards (rated at 3v) killing them.  And was proven right.


This unholy quartet of product caused havok, but I eventually found that new readers could kill either new or old cards, they had to be recalled... Then new Cards could die randomly in even old reliable readers, they had to be recalled.  Which means we slowly struggled to find old readers and old cards.

All of this was a purchasing foul up, unfortunately managers saw it as an engineering problem and so one had to code around poor hardware.

The first thing we did was add two toggles, one for "old card" which I could detect from the card chip type being read on reader access.  This slowed the reading of the card down... form 5 minutes to every 30 minutes, so we ricked giving customers longer before an unlicensed machine went out of action, but it was accepted to give us a much longer read life for the card cell.

Then we deferred the first read of the card, on boot up we literally leave the USB device completely alone, let windows start and everything settle on the desktop driven system.  And after 5 minutes we'd start our licensing check.  It was accepted that a user could technically receive 4m59s of unlicensed use and then reboot to get more time, but that would be a little impractical in this usage scenario.

Doing these two things we could just about use the new readers...

But the new cards were just so utterly terrible, we did eventually have to buy better cards.  I never heard if there was a refund on the originals, but I can assure you my time along cost more than the £120 they saved going with these cheap cards.

Friday, 16 December 2016

Software Engineering : Is not Engineering

Right I'm guilty, and annoyed at myself, and making a change... Though I might still keep this as a tab on posts... I AM NOT GOING TO USE THE TERM "SOFTWARE ENGINEER" anymore....

This makes my degree certificate wrong, as it clearly states "Software Engineering", but even though that is indeed what I do every day, and what I read about every night, it is not what; nor who; I am... I am a programmer, a hacker (in the traditional sense), a tinkerer and a student of all things software.

Many other writers have call us programmers out on this, and finally, I'm going to eat humble pie and agree, when one sits down to write code one is not doing what the great engineers did, we are not forging rail-ways, bridges, hulls of great ships or physical tangible results which must stand the test of time.

We are building a more ethereal, almost smoke and mirror concept, results through the action of our instructions through another, that is programming it is what I do.

Why do I want to make this distinction?  Well, as you may tell from some of the recent posts around here, I've been involved in merging parts of teams and companies, meeting both incoming and shifting personnel to fit them into the matrix that spells "results" for a company.

No code has yet been cut, but a new team, and new ideas might very well be needed.  In turn I have reached out there and been talking to others, to recruiters, to other companies, and indeed I've sat before other people.

My friends also call upon my expertise, as one of the few from our graduating class still working in Software or indeed technology, I am often called upon for a little technical guidance.

Results have been mixed, but the determinable difference I have had between success and failure has relied, nearly exclusively, on the other party understanding the term "Software Engineer", it does not mean we can programme your VCR, set the clock on your Microwave, or save your phone contacts to your SIM card.  It means we are able to employ structured methods, to define procedure, and to design, write and test then document code as products for use or sale.

This does not include our being Electrical, Mechanical or Structural Engineers!

I am not trying, willing, or able to build the next Channel Tunnel, or Skylab, or HMS Bullshit.  I am able to cut code to make an existing system, or device, bend to the will of requirements upon it, I am able to look at the said device and decide whether it is fit for the purpose or not, I am not creating that device!

Creating said device is Mechanical or Electrical Engineering, I am Software, the use of the "Engineer" moniker is causing some confusion, some blurring of lines and so to help delimit this boundary and stop this confusion from now on I will self identify as a Programmer, and cease to try to explain all that this entails.

I am a Programmer, a Lead Programmer, a Systems Programmer, a Device Programmer, a Prototype Programmer, a Senior Programmer, a Team Leading Programmer, a Development Provisioning Programmer, no longer am I an Engineer!