Sunday, 6 September 2020

Forgot to Ride a Bike... Or?

I'm totally confused.... For you see, they say once you've learned to ride a bike you never forget, and I've not forgot, not one bit... So what's the problem?  Well, the problem is that the bikes have changed!

No I'm not talking electric, no I'm not talking mountain or anything like that, I'm talking about the brakes.

You see, I'm right handed, therefore my primary braking hand is my right hand, in a car I brake with my right foot, on a bike I want to brake with my right hand, this is how it is, how it was and how it always should be.

This should be your rear brake, you brake with your rear wheel first, so you don't lock the front up and go over the handle bars.

It's simple safety and ergonomics, and sure as a kind I remember lefties swapping their brake handles.

But... but... we've bought a bike from France... From those duplicitous Francophiles at Decathlon... And the primary brake is the rear brake, but they put it on the left!!!!

They literally put it on backwards.

It does my nut in, that I have to crack the plastic on the molded handle and swap them over... And no, I can't just remove the wire and swap that, because the rear brake wire is a physically different material to the front!

It is infuriating.

Friday, 4 September 2020

Great Rack Mount Mistakes #7

Today's story in the annuls of problems in IT comes from a guest editor... Mr B.... And Mr B (no relation to anyone in other stories given monogram names) works as the sysadmin and developer for the whole set of systems with his employer; unfortunately this means "it's all his fault".

So what went wrong?  Well, over night the site had a power cut and though they have a nice server, they don't have a power back up, so that server went off.

The server is essentially a java host, specifically hosting Tomcat, and it reaches out to connect to a set of third party endpoints via a restful API.

You'd think no big deal, start up get running and keep running, except that third party don't force a disconnect upon a new edition of their interface API, if you're connected to version 1.0 then you can and they will happily leave you connected to version 1.0, even if they release interim updates, add new calls and quite what got Mr B today remove a call or two.

Your session ending, and then I presume all sessions of that old version, would free their server provisioning to de-allocate the old version.  But to force users to migrate upwards in the chain their published API (so think the end point here in whatever flavour you wish) declaration changes.  Such that you re-download it upon re-connection and that's your new flavour of the month API.

The problem?  It didn't work.

So Mr B had to set about debugging this on the fly, in a live environment, which was down.  And he went through the three stages of technological grief....

1) Denial:    "This is completely illogical, my code brings down their interface, which is the only thing we connect to, it must be right, they can't miss-match them, so this must be my side or the gods are against me".

2) Investigation:  "Read the logs, make a change, nothing seems to work, the gods are definitely snickering behind that cloud of steam now".

3) Realisation:   "If it's not me, and it's not the system here, it must be their side, the huge multi-billion international must have published their API spec with a mistake or miss-match.... click.... YOU FUCKERS!"

What was the actual problem?  Well, the third party published API was actually wrong, the downloaded specification still contained several calls which were removed, when the services Mr B had written came up they checked each end point and found several calls defined which did not respond and so his software, correctly, reported that the endpoint was offline.  They were, they didn't exist anymore.

His fix was to literally tell his stuff to ignore the multi-billion dollar international service providers API spec and to "download" a copy which he hosted locally, with his own edits to it.

Now, he's a tiny fish in a huge pond here, even if he reports this miss-match said multi-billion dollar international isn't going to hear him, and by the time he does they maybe several months down the line, and other folks may have spotted this problem.  He maybe listened to, but he essentially doubts his voice would be heard.

The problem of course being how to abate this issue in the future?  How to avoid this stress?  For at one point he did say "the company is done for", because literally everything was offline, all their services were down.... And of course everyone will blame the little guy doing all the IT, they won't think that the multi-billion behemoth entity could possibly publish a wonky API spec, most of those shouting at Mr B with mouths frothing wouldn't even know what he meant when he explained this to them...

The fact that he's identified this issue, resolved it, and everything is back up within two hours won't be remembered, the glass will remain half-empty, and so it'll only be remembered that on the 3rd September 2020 Mr B's IT suite went offline.

Wednesday, 19 August 2020

How much CPU?

This is a story from around six years ago, when I was given a virtual machine host on a server by the IT guy and he was a little perplexed that I actually used it... A LOT.  But we also found something out about the way his session management worked.

The host machine was some Dell server with a pair of Xeon E5 class processors, I don't know what they actually were, for I only saw a very limited guest virtual machine.

My virtual machine was some debian based distro, I think Ubuntu, but running gnome not Wayland.  because we were working on some gnome hosted front end.

Anyway, the machine was a continuous integration machine, and I immediately set to making it build the software every time there was a commit to the repo and also to build it overnight and give us a daily build and smoke test.

The problem?  Well, this was quite a lot of work.  I'd been provisioned to use two cores on the host machine and I basically maxed them out, building the whole system software and then overnight building the whole toolchain and then the system and packaging it after running a series of intense tests basically took 100% CPU for about 3 hours.

And the IT manager was a little befuddled quite what was going on when he kept getting told there was massive activity on his machines overnight.

He wondered about hackers, or some vulnerability which had let them in, of course being a windows guy and the host being windows he just blamed Linux, and I sat pondering the problem and then simply asked "Do you know what the machine is being asked to do?"

And I showed him on my workstation, which was 8 cores and 32GB of RAM... and he blanched at the amount of work... "You're only building software, why does it max out the CPU for that long?"

The lesson?  Don't skimp on your continuous integration!

Friday, 7 August 2020

Ten+ Years of Phoenix : A history of a Project

The Phoenix Story.... To be clear I'm not talking about the book:

No, I'm talking about a project I worked on for just shy of 10 years, which started off as the idea to really sort of the product line we were working on, software that did end up on thousands of machines in its time, was ported to at least five different hardware planforms but which ultimately needed replacing almost before it hit the market.

Phoenix was conceived, I think, by my then boss.. We'll call him "Mr B".  And it stemmed from frustration with the previous released to the public project which was written in a bit of a shambling mess of bad C/C++, Java and uses a marathon database.

Marathon you might ask, was an open source variant of Dbase, and had been used by one of the "senior" software engineers in her prior role.  If you google for marathon database today you'll be hard pressed to find a link to it.  There's a reason for this and that is it sucks.

But before we go into the technicals there, "Senior" software engineer.... lets call her "Mrs N", really was a bit of a number, when I joined as only the fifth permanent staff (and ironically I joined as a technical documenter not a programmer) she made out she was very senior and really made out that she knew everything.... She effervesced this air of knowing and knowledge.... It wasn't until many years later I learned she had only been there two weeks longer than me... a mere ten days and that she wasn't a "senior" anything, she was leveraging herself to the management regularly to make herself sound indispensable... She wasn't.... Not at all.

Back to the technicals, this marathon database was to hold ALL the information about the system state, literally it was a state machine without the words "state" or "machine" being used, the software created a connection to the marathon driver (I can't remember if it was ODBC or ADO) but the up shot was the system ran incredibly sluggishly.  Booting windows and then the software took upwards of 5 minutes, it was a real bane and a pain to work with.

Everything, and I mean everything, went through a stored procedure Mrs N wrote, if she's not written it, then you were not going to get that data, and if the result of the stored procedure was not what you expected then you had to submit a new request, she never really went back and fixed bugs, things were either the way she wanted them or you had to ask for more things... so there was massive bloat and creep in this code base almost from the get go.

Bless him Mr B started to try and do code review, and he had an idea of trying scrum, but it wasn't scrum, he didn't make everyone be quiet and talk their talk them move on, and I think I've touched on that shambles before.

With Mrs N ruling the roost Mr B quietly (and sometimes not so quietly) retreated to his office and he would hire folks.  And at some point he hired a contractor, Mr J....

Mr J in the short term was okay, he did his thing, he knew his stuff, and he had good ideas, but he was hard to work with.  But he knew this new thing called C#.  And quietly in the bosses office, on his whiteboard was born "Phoenix".

The moment of Mr J from developing on the marathon mess left a vacuum, into which I was placed, technical documents out the window they needed a C/C++ programmer, and I plugged away at that system helping the web developer do an integrated menu and control system, but we constantly battled Mrs N.

Mr J and Mr B however kicked off this C# can, and soon they had a state machine, an actual state machine by Mr B and an event passing system by Mr J... the special thing about this event passing system was that is worked cross-process, so you could stand up a C# application connect it to listen for a message... then stand up a completely separate C# application and it could send the message to the first... This allowed for a highly modular design, with each part of the system mapped down to a state machine and a connection to the message bus.

It made perfect sense, it worked, it was better than using the slow assed database as the inter-process communication bus, and importantly the events could be monitored, so you could introspect what was going on.

This was Phoenix... State Machine, Interprocessing Communication Messages, Highly modular.

I would assume at some point the plan was to make things like new features modular so a customer could buy a base configured machine and say opt to allow it to do whatever other functions down the line, like add accepting credit cards, or add printing a ticket, whatever.  Modularize it and add it on later, or critically remove it.

The problem?  Making this highly modular effectively meant everything was highly asynchronous, and there was no way to synchronise the ballet of messages going between applications, after a few versions even something as simple as booting the module executables in the right order became a tiresome mess.

But, the initial versions went out to customers and they did buy them, and so the product lived on....However, a highly async process stack, running on a single core Celeron processor... isn't particularly asynchronous, there's actually a huge amount of linear ordering going on all the time, and massive amounts of process context switching.... So much so that in some configurations over half the 1GB of RAM was being hogged by just passing messages to systems which had yet to be woken by the windows scheduler and clear their backlog of events.

So it was after about six years a new piece of hardware hoved into view, it was a dual core box... Immediately the systems were out of order, things were going wrong, and it was a nightmare.

To be fair, the idea was sound, the implementation not so much and the language used, though flexible incurred (at that time) too high an overhead on the hardware.

Mr B struggled with this, and then he hired a new manager, a sub-manager to himself, to control testing.  Lets call him Mr P... Mr P initially had some good ideas, he used dot to create some process flow information, he tried to control contractor Mr J (who was not a permanent member of staff as he was the only person who knew anything about the message passing) and they were all wrangling to try and make this system work on newer and newer hardware, until one day it had to go onto a wholly other machine....

None of the hardware interaction stuff was abstracted, so it was written all over, but still not abstracted... you could boot the old hardware layer or the new... or even both and make it eat its own tail.

And then there were lay offs... a bunch of the folks working for Mr P went, a bunch of the testers went, and there was a big reshuffle.... I got to work right in front of Mr B... and started to try and bend his ear, to push him in new directions and not rely on Mr J so much... There's a story in that to tell.

However, things became really interesting at the turn of the final year of the whole team being on the project, a new piece of hardware was yet again acquired, and the company wanted Phoenix to run on it.

I set about making that happen, and I just ignored Mr P, and Mr A and Mr B.... I made the new hardware version, and fixed a huge swathe of problems with booting and data at the same time.  I couldn't disentangle from the death embrace of the message passing and multi-process context switching, but I made things better.*

* One of the things I did was find that the C# file exists check was about 10x slower than doing it as a call to "fstat", so I made a dll with just a call to fstat exposed, loaded that DLL into C# and that made the code massively faster at boot.  I then found all the C# "image.load" functions were really slow, but the equivalent from GDI+ were much quicker, add them to the DLL and voila C# went faster.

So when at the end of Q1 the team was basically told, we're letting you all go, we need one person to keep the project tided over, they picked me... I'd demonstrated a breadth of knowledge, fixed so many problems and hey one can't easily fall out with oneself.

Looking back I think the decision to keep me shocked a lot of people who thought themselves indispensable, but the proof was in the following few months.

Where the fault rate went down from the 20% range, to 18%, then 7% and finally to sub 1% range.  We were on more machines than ever, but we'd cleaned up the code base, the three testers we had went over the systems and order in a methodical manner.  I intigated a policy of cleaning up the code as I went, and removing interprocess communications, reducing the number of vertical slices you had to hop over to get to the functionality you required, and I also wrote tooling... Event viewers.... State machine interprocess visualizers, even a GUI designer.

All to turn the human element into a tooled touch of the code, control how it's used.

That didn't ever meant the testers and operations manager (Mr D) was detered from playing with the system, far from it... But things were better, and didn't change so badly.

The Phoenix project had a legacy though, the structure and controls... The message passing, the installation process, the IPC and the state machine were all a little wonky in their quirky little ways.

So it was in 2014 I restarted my won personal effort to write a new system, at home, alone, in C++.  That was Bluebird and it has a whole other story to tell.

Tuesday, 4 August 2020

Raw Graphics Engine : C++ Project

It has been a whilst since I had a personal improvement project grace these pages, so here's one I started over the weekend.... A graphics engine.

Sure this is something I play about with all day in the office, we're writing games!  That's literally my job, but I've been a system engineer for such a very long time, and I've seen all these sparkly things coming from folks working on game play and wanted some sparklies of my own.

I therefore began two projects, both are graphics engines, but they're very different from one another... one is in Vulkan, which is not what we're talking about here, no we're talking about the other one... And this is a graphics engine I've written myself.

It's gone through three phases since Saturday, Sunday and then just tonight.  The first phase was setting up the basic rendering, getting a triangle on the screen and making it flat (orthographic) projection.

The engine is written in C++, uses SDL2 for the window and renderer, but the engine itself does all the geometry transforms through linear matrix mathematics that I hand crafted, and it reaches into the third dimension in orthographic mode.

The shapes can be rotated, scaled, translated, the usual.  But before I drove myself mad with writing shapes by hand on graph paper, I wrote a very simple importer for the very simple Milkshape 3D model editor, and started with a sphere:

Milkshape has appeared on these pages before and is really the only modelling package I'm familiar with, I really do need to learn Blender don't I?

So with models loading I got a little adventurous:

This mesh really stresses my single core linear mathematics, so I started to switch it out in favour of GLM tonight:

// Model
glm::mat4 model(1.0f);
model = glm::translate(model, trans);
model = glm::rotate(model, glm::radians(angleZ), { 0, 0, 1 });
model = glm::rotate(model, glm::radians(angleY), { 0, 1, 0 });
model = glm::rotate(model, glm::radians(angleX), { 1, 0, 0 });
model = glm::scale(model, glm::vec3(scale.x, scale.y, scale.z));

So, that's been my three days.. I'm interested where and what I will do with this engine.

However, Vulkan, that's the other thing I'm learning.

Thursday, 23 July 2020

Drop Ships Dilemma

I'd never heard of "drop shipping", to me it conjure soldiers in heavily armed space-shuttles being flung pell-mell at the surface of an enemy planet with the winds of plasma and fire burning all around them.
However, what it actually means; apparently; is that you've bought something - say a chair - off of a reputable site like Amazon, but you did so not from them but seller on their platform.

This seller then doesn't actually sell you anything, what they do is contact their own network of suppliers and they buy the item for you and get it shipped by that third party to you.  The drop ship being that third party, to whom you don't exist, they deliver to your address as though you and your address is the middleman you met on whatever selling platform you chose.

And apparently, this is legit on some selling platforms, Amazon, Ebay etc etc.  The former put some extra steps in there, such as if you have an issue that middleman has to accept the return and handle the shipping, but this is a nightmare, because you use something like Amazon or Ebay for the protection the service affords, the expectation of easy and care free returns.

The trouble being, this middle man is only there to cream a little commission.

So my issue was a chair, is a chair - this is still going on - which I ordered from Amazon on the 13th, I didn't even clock it wasn't via amazon themselves, this chair has a brand and a name and was set to be delivered on the 20th.

Which came and went no sign, also no tracking, when these drop shippers do their thing they can keep their cards close to their chest, Amazon just ask that they've dispatched when they say they have ditto for ebay.

Trouble was this middle-man tried to tell me that they had called and left me a voice mail, and I'd not gotten back to them... .except, I don't have a voice mail, don't like voice mails... so they lied.

During this discussion they then said (bare in mind this is the 21st, a day late and eight days since ordering, which said "dispatched") it'd not even been dispatched, indeed it wasn't expected into stock with them until the 27th.

Silence.... I want my chair, I want to know when it'll arrive.

I still don't actually know whether this is a drop shipper or a scam.

"We have another chair, I assure you it's better, more expensive which we can get to you tomorrow".

Now, just because something is more expensive does not make it better, and I spent time picking this chair out on Amazon, it has all the bells and whistles I want... four major features and a comfy shape.

This middle man is persisting to send me details of another chair, and "if you agree I'll send that next day delivery"... this seems like a scam to me, but I don't even get to choose because the information never arrived.

I never saw the info, so never okay-ed it, I never wanted anything except the item I ordered.

Just now, I'm in the kitchen and I hear the letter box flap, and a van drive away... there's been no knocking, no noise made, just a note through the door "left with neighbour".

Hate my neighbours, but go find out what this delivery was... turns out to be a chair... Not the chair I ordered.

A quite ugly square one... with two features I ordered missing, made of plastic not steel, and a manufacturing fault along the top cushion.

So I call this guy up, tell him "I was expected to wait for the item I ordered, I never got the info, never asked for this item, it's the wrong chair and missing features and made of the wrong material".

"This is a more expensive chair" he says.

It certainly doesn't look it, and even if it arrived with gold bullion stuffing the cushions I'd return it because it's not the chair I ordered.

This guy is getting on my nerves, and that's when I realize I've been drop shipped, this guy is in an office himself, this is "his business", but he's clearly at work doing something else, he's not a chair saleman, he's just acting as a middleman.

So back to the selling platform, Amazon... who... seem to be quite happy to let 48 hours pass until they will even talk to me, to give the guy time to reply to me...

I therefore am out of pocket, still without a chair, and frustrated by this whole mess.

I think Amazon need to make a very much more clear distinction between the items sold and delivered by them and items being drop shipped like this or worse.  "FROM AMAZON" in a big clear type.

There's even issues there, for instance I've seen resellers listing parts like say "Intel CPU's" and it'll be listed as "by Intel"... but it's not being sold by Intel at all, sure it's made by intel, but "John Doe Computer Tech in Driotwich" are the seller on Amazon, and really we need to be told that more clearly to avoid the kind of situation I'm now in.

Wednesday, 22 July 2020

New Blogger is Bullshit (and Slow)

The new blogger interface.... Gah, so I hate it, they've moved things around and it's so very very slow, I just wrote my first blog post with it and getting the labels on took literally minutes as you type and it flickers about and then just terrible.

It was sluggish to do anything at all, including formatting.  Literally to select all then justify took tens of seconds.

It's instant in here, always has been... stupid new interface...

It's a great example of change for changes sake.