Friday 18 November 2022

Story Time - Fantastic Rack Mount Mistakes - Part 33 and 1/3

It has been a very long time since my last story time, and so this one comes to you with an unknown number, I am however well beyond any possible NDA or secrets for this one, so I can finally share (it's been 20 years yesterday).

I had a job in a data center, nothing too special, but I loved it.  The loud room, the air conditioning, the rack upon rack of machine, I absolutely loved it.  We're talking 2002 here, these were the 2U high style of machine I use a lot today, the odd 4U or 8U JBOD box too, loved them all.

I had all my wiring terminated and was beautifully dressed, I wasn't in charge of the whole place, just one isle, but I took my time whenever there was an issue to run new cable bundles beautifully dressed along the cabinet runners into each cabinet top, seven cabinets to a row, two face to face on the cold row and then the back of these into the hot, I had a hot row run by a lovely guy called Jerod next to me, he was French I think, lovely guy (shout out to Jerod if you're reading!).

The other side I had the work bench, a coveted space, between the hot and cold isles were thick plastic strip drapes, like you'd get in a meat packing plant.

We would patrol our isle, identifying any issues or disks showing physically bad, schedule them in for late night fix ups and then to into the communal office where we each hot desked.

Each of us had a laptop and would sit and connect into the monitoring suite remotely, with a big screen we connected via VGA to our laptops and dozens of spare keyboards we could quite often spend the night shifts doing nothing but playing games; Medieval Total War was the order of the day for me at the time; some of you know of my origins in the "game industry" take me back to hacking about and modifying MTW (Sorry Tom, whom I now work with, for fishing about breaking your baby).

Anyway, about three months into this job another tech on the day shift came to me about Isle 4, the far end of the room, he was getting drop outs at intervals during the day, but none at night, he was trying to figure out what might be causing it.

We took a look at his logs and sure enough on his disks he had 250-700ms delays for seek times on his platters, he also had a higher than average fault rate on disks, apparently he was also rebuilding raid arrays a lot.

He asked us in the night team to watch over the kit live; which we duely did.... Quiet as a mouse, nothing not an issue ever in about three months of monitoring.

Meanwhile in day light he was tearing his hair out; I've got to admit to genuinely not remembering his name, but he had long greasy black hair and was from Northampton which he told me more than his name... "Hello I'm <forgetable>, I live in Northampton".

But forgettable as he was I can't forget the day he handed that isle over to me.

Stumped as to what was going on, and frankly pulling rank, he shuffled Jerod down to Isle 3, himself into Isle 1 (my isle) and all new machines were coming into isle 2, which mr senior with disk failures was going to be setting up.

This really irked me, my isle was by far the best presented, when sales wanted to show new accounts around they showed them my isle and they all laughed over the name of the cabinets; we couldn't name the machines, most of them belongs to customers, but we could sticker and name the cabinets themselves, so they all got Bond Theme names.... Jaws, Octopussy, Moneypenny, Q and M all had simple servers, processors more or less and my two JBOD cabinets were Sean and Roger.  Stencilled on in lovely white enamel paint.

So I was hoofed off to isle 4, dark, at the back, no work bench... and immediately set about making it my own, I redressed the cable bundles (which annoyed Mr Senior, though he loved my already done ones, my dressing his was seen as an overt declaration of war).  It's so simple too, cable tie around five, cable tie through the middle to break them into three and two sets then pull.... Today you can get cable combs and I find them so sexy.

Anyway, dressed to impress what was not impressing was the fault rate, and instead of it being the accepted norm; as it had been for Mr Senior for months; it was now a big deal because I was in charge of the isle.

I set about trying to work it out, I could not figure it out.

At night, when I was there it was all fine, the day guys saw nothing; they pretty much just tended the machines remotely, no maintenance happened between 7am and 7pm so it was up to the night guys like me to swap disks, rebuild arrays, fix cables and install new machines.

So after two weeks with the log I got showing all these faults, like what was causing a disk which had worked all night and most of the day suddenly to show 500+ milliseconds seek!  It made no sense.

Anyway, 2002 was coming to a close, December 21st rolled around and I offered to cover a couple of the guys shifts, they had kids it was nearly Christmas and we worked Christmas Eve and half Christmas Day, one guy on on guy off for 6 hours at a time those two days, I took two shifts back to back.

Alone in the office, I watched the monitors and played games.

No spikes.... What gives, no spikes on the DAY I am actually here.

Hold on, I'm here, but no-one else is.

Could it be human interference?  I checked electrical circuits for noise, I checked lights, I checked if the air conditioning was affected outside and in.  I checked everything, no signs, no peaks, no noise.

HANG ON.  NO NOISE!

Spinning disks can be affected by loud noises, specifically by vibrations, I'd seen this demonstrated when doing my Compaq certification training.

There was no noise, could it just be vibration?

We were on the first floor (for anyone in the US, this is the floor above the one on the ground) so we are one storey up.  There was the main reception below center of the room, to the right where isle 1 is would be a hall way void with offices leading off it, under isle 4 would be a toilet and a changing room.

I went down, put the shower on, came back up.... Nope.

Then I took my laptop with me, no wifi, but I could plug it into various office ethernet ports around the place as no-one was in.

There I am Christmas Day, banging doors, flicking lights, flushing loos and then pouncing on my laptop to see if it affected any of the disk activity I was artificially running up.

Nothing.

Defeated I logged my time, handed over to Jerod and went to have my Christmas dinner.  I had 12 hours to think of something for Boxing day.

I walked into Mr Senior asking why there was CCTV of me "Dashing about with my laptop in random offices".

I just said I was trying to check light circuits for his disk issues; he made it abundantly clear they were my disk issues and left me to it.

Boxing day, the sales had started, the building was at the corner of a large commercial estate, there being a newly built Ikea across the road as the crowds rolled in and their stock levels fell they would be due a delivery soon.

I still poked around the office and the eureka moment came on the 27th December 2002.  For a large lorry was rumbling past the office, I had to wait for it to pass, a massive blue IKEA clad lorry.  I went into the office, logged into the monitoring and sure enough there was a trace of a large disk issue.  Times 4 minutes before; when that lorry went past!

I didn't hear it, I didn't feel it... But had the disks?

I waited and watched when a few lorries passed during the fairly quiet week between Christmas and New Years, nearly every heavy lorry going back resulted in some affect on the disks, vibration was being carried into the ground and I guess up through the building.  I will be honest, I could not tell.  But every spike I saw was timed with a lorry.  In fact I soon let Jerod in on my idea and plan to fix it and so he watched and I monitored and when he came in I could tell him when a lorry has passed!

I was not about to shout about this to Mr Senior, instead I set about fixing it.

I ordered a mat of 1 inch thick rubber, the stuff you mount washing machines on in your kitchen.  I already knew the racks pretty well, they were bolted at each foot, I'd need a torque wrench to unbolt them and I could use two of the hydraulic scissor platform lift trolleys we used to move machines about to lift the rack ever so slightly.

I didn't want to do this alone, so Jerod was roped in with the promise of a take away pizza from the glorious; but long gone; parlour we loved.

Mr Senior handed over to me and Jerod that night, it was not uncommon for two of us to be on at night, especially if there was work to do fitting something out.  It was normal to have two people when we were lifting machines too.  But Mr Senior would have been apoplectic if he'd know what us two kids were about to do.

I unscrewed the first rack from the floor, swept the dust out and used a wooden baton to bridge the lip to the metal of the scissor lift and I cranked it.... The cabinet moved, I nearly wet myself as it looked like it was going to topple, then Jerod jacked his side up on the other side and it came level, with an inch to spare I slipped in two of the pads.   He slipped in his two and at a shout over the loud AC we lowered.

I then used a phillips head screw driver to puncture the rubber and a knife to dig a bit out through each bolt hold and fastened the bolt back through, not too tight, just tight enough.

It was sweaty work in the hot isle as I was, but we did the first three of seven that night.

I went home and decided to pre cut the squares and use a drill to cut the middle out of them,the next night all seven were done.

Our disk fault rate when to zero.

That was my last time as an IT minion; I went back to programming soon after, and Mr Senior never was told what we did... His training should have told him.

No comments:

Post a Comment