The Phoenix Story.... To be clear I'm not talking about the book:
No, I'm talking about a project I worked on for just shy of 10 years, which started off as the idea to really sort out the product line we were working on, software that did end up on thousands of machines in its time; was ported to at least five different hardware platforms; but which ultimately needed replacing almost before it hit the market.
Phoenix was conceived, I think, by my then boss... We'll call him "Mr B". And it stemmed from frustration with the previous released to the public project, which was written in a bit of a shambling mess of bad C and only one chap knew how to work with it when I joined... It might have been a little bit of C++ (I remember the chap reading the first edition of the STL book)... And then a failed attempt to make a new system in Java, a bit of C++ and some SQL.... but mainly java connected to a marathon database.
Marathon you might ask, was an open source variant of DbaseIV and had been used by one of the "senior" software engineers in her prior role. If you google for marathon database today you'll be hard pressed to find a link to it. There's a reason for this.... It sucks.
But before we go into the technical's here, "Senior" software engineer.... lets call her "Mrs N", really was a bit of a number, when I joined as only the fifth permanent staff (and ironically I joined as a technical documentor not a programmer) she made out she was very senior and really made out that she knew everything.... She effervesced this air of knowing and knowledge, she very much talked the talk... But did not walk the walk. It wasn't until many years later that I learned she had only been there two weeks longer than me, a mere ten days and that she wasn't a "senior" anything, she was leveraging herself to the management regularly to make herself sound indispensable... She wasn't.... Not at all.
Back to the technical's, this marathon database was to hold ALL the information about the system state, literally it was a state machine without the words "state" or "machine" being used. The software created a connection to the marathon driver (I can't remember if it was ODBC or ADO) but the up shot was the system ran incredibly sluggishly. Booting windows and then the software took upwards of 5 minutes, it was a real bane and a pain to work with.
Everything, and I mean everything, went through a stored procedure; which Mrs N wrote, if she'd not written it, then you were not going to get that data and if the result of the stored procedure was not what you expected then you had to submit a new request, she never really went back and fixed bugs, things were either the way she wanted them or you had to ask for more things... so there was massive bloat and creep in this code base almost from the get go.
Bless him Mr B started to try and do code review, he had an idea of trying scrum, but it wasn't scrum, he didn't make everyone be quiet and talk one at a time, quickly then move on.. No.... And I've touched on that shambles before.
With Mrs N ruling the roost Mr B quietly (and sometimes not so quietly) retreated to his office and he would hire folks. And at some point he hired a contractor, Mr J....
Mr J in the short term was okay, he did his thing, he knew his stuff and he had good ideas, but he was hard to work with. But he knew this new thing called C#. And quietly in the bosses office, on his whiteboard was born "Phoenix".
The moment of Mr J presenting this to the boss he was moved onto this super secret mission to make the new new system in C#.
Which left a development vacuum.. Mr B needed someone to use in order to hide Mr J was not doing shitty java to marathon mouth to mouth.... I was thrown to the lion that was Mrs N.
Technical documents out the window, into that position they needed a programmer and I plugged away at that system helping the web developer do an integrated menu and control system, but we constantly battled Mrs N.
Mr J and Mr B however kicked off this C# and soon they had a state machine, an actual state machine by Mr B and an event passing system by Mr J... the special thing about this event passing system was that is worked cross-process, so you could stand up a C# application connect it to listen for a message... then stand up a completely separate C# application and it could send the message to the first... This allowed for a highly modular design with each part of the system mapped down to a state machine and a connection to the message bus.
It made perfect sense, it worked, it was better than using the slow assed database as the inter-process communication bus and importantly the events could be monitored, so you could introspect what was going on.
This was Phoenix... State Machine, Inter-processing Communication Messages, Highly modular.
I would assume at some point the plan was to make things like new features modular so a customer could buy a base configured machine and opt to allow it to do whatever other functions down the line, like add accepting credit cards or add printing a ticket, whatever. Modularize it and add it on later, or critically remove it.
The problem? Making this highly modular effectively meant everything was highly asynchronous, and there was no way to synchronise the ballet of messages going between applications, after a few versions even something as simple as booting the module executable's in the right order became a tiresome mess.
But, the initial versions went out to customers and they did buy them, and so the product lived on.... However, a highly async process stack, running on a single core Celeron processor... isn't particularly asynchronous, there's actually a huge amount of linear ordering going on all the time and massive amounts of process context switching.... So much so that in some configurations over half the 1GB of RAM was being hogged by just passing messages to systems which had yet to be woken by the windows scheduler and clear their backlog of events.
So it was after about six years a new piece of hardware hoved into view, it was a dual core box... Immediately the systems were out of order, things were going wrong, and it was a nightmare.
To be fair, the idea was sound, the implementation not so much and the language used, though flexible incurred (at that time) too high an overhead on the hardware.
Mr B struggled with this, and then he hired a new manager, a sub-manager to himself, to control testing. Lets call him Mr P... Mr P initially had some good ideas, he used dot to create some process flow information, he tried to control contractor Mr J (who was not a permanent member of staff as he was the only person who knew anything about the message passing) and they were all wrangling to try and make this system work on newer and newer hardware, until one day it had to go onto a wholly other machine....
None of the hardware interaction stuff was abstracted, so it was written all over, but still not abstracted... you could boot the old hardware layer or the new... or even both and make it eat its own tail.
And then there were lay offs... a bunch of the folks working for Mr P went, a bunch of the testers went, and there was a big reshuffle.... I got to work right in front of Mr B... and started to try and bend his ear, to push him in new directions and not rely on Mr J so much... There's a story in that to tell.
However, things became really interesting at the turn of the final year of the whole team being on the project, a new piece of hardware was yet again acquired and the company wanted Phoenix to run on it.
I set about making that happen, and I just ignored Mr P, and Mr J and Mr B.... I made the new hardware version and fixed a huge swathe of problems with booting and data at the same time. I couldn't disentangle from the death embrace of the message passing and multi-process context switching, but I made things better.*
* One of the things I did was find that the C# file exists check was about 10x slower than doing it as a call to "fstat", so I made a dll with just a call to fstat exposed, loaded that DLL into C# and that made the code massively faster at boot. I then found all the C# "image.load" functions were really slow, but the equivalent from GDI+ were much quicker, add them to the DLL and voila C# went faster.
So when at the end of Q1 the team was basically told, we're letting you all go, we need one person to keep the project tided over, they picked me... I'd demonstrated a breadth of knowledge, fixed so many problems and hey one can't easily fall out with oneself.
Looking back I think the decision to keep me shocked a lot of people who thought themselves indispensable, but the proof was in the following few months.
Where the fault rate went down from the 20% range, to 18%, then 7% and finally to sub 1% range. We were on more machines than ever, but we'd cleaned up the code base, the three testers we had went over the systems in a methodical manner. I instigated a policy of cleaning up the code as I went, removing inter-process communications, reducing the number of vertical slices you had to hop over to get to the functionality you required and I also wrote tooling... Event viewers.... State machine inter-process visualizers, even a GUI designer.
All to turn the human element into a tooled touch of the code, control how it's used.
That didn't ever mean the testers and operations manager (Mr D) was dettered from playing with the system, far from it... But things were better.
The Phoenix project had a legacy though, the structure and controls... The message passing, the installation process, the IPC and the state machine were all a little wonky in their quirky little ways.
So it was in 2014 I restarted my own personal effort to write a new system, at home, alone, in C++. That was Bluebird and it has a whole other story to tell.
No comments:
Post a Comment