Showing posts with label Fire. Show all posts
Showing posts with label Fire. Show all posts

Wednesday, 31 August 2016

Story Time : Fantastic Rack Mount Mistakes #2

Nothing is going to compare to the first post I made in this little series, but this was the first server room moment I had had to deal with.

I started work at a location, not very far from where I live now, and there was a little server room, again an old NCR mini-computer but this time it was running a custom cut of SCO-Unix, this machine was important, it handled everything for the whole midlands operation of this company.

And it was my first day...

There was a rack with kit on the right, and a rack with kit on the left, and you walked down the gap in the two... This was particularly bad, as it left you no room to move in there.

I was being shown around by the previous guy, whom was leaving, and I was really nervous... (Jim if you're out there, I hope you remember this).

So, we'd just concluded the tour of the server cupboard when I noted a black fleck in the air flow around me, another and another.  "Jim, there's black stuff".

Before I had the words out my mouth we got an awful smell of burning, everything was still on though.  All the network switches, the minicomputer, the UPS's the PC's running MSMail... Everything was still on, even the CRT monitors....

Something was certainly going wrong through, Jim stepped to the top of the room, leaving me in the middle isle, and he slid down the back of the rack and took a look, suddenly his arms were up and he's pulling the cables from a CRT monitor, and he yanks it down, it's clearly burning!

I pull the side door open and he skittles this thing down the corridor where it comes to a halt, happy it's not in the server room and not actually burning up we turn back to see what mess has been done.

The air has already been filtered so the soot removed, everything looked fine.  Except, unspotted by myself (I was only a few hours into this role) there was a little warning light on one of the UPS's, warning that it was on battery only.... I missed it, Jim missed it... Don however didn't miss it, as suddenly at the main door Don skidded into view, and nearly body tackled the pair of us inside the room, as he was down on his hip pulling at the mini computer.

"There's no power!"

The minicomputer was still on, but pressing a key on a terminal it was reporting it was busy shutting down.  The UPS was warning it had low power.... The company was offline.

Everything stopped... from ray role to purchasing, stopped.  Over 200 people, stopped.  COSTLY!

What had happened was as Jim had gone around the back, he'd stood on a cable, the mess of cables to be honest, behind the rack.

And one of them was the female to male coupler for the UPS to the wall, the plug was still in the wall, disappearing into the mess of cable, and then a cable emerging and plugged into the UPS, so it looked fine... and the UPS was connected to the minicomputer, with nice screwed in connectors... Hard to remove.

But this coupler was just laying there in the morass, and it was hard to see, and it was not screwed together at all... And it had come loose.

Needless to say, Jim was never remembered for saving the room from the fire, but remembered for unplugging the minicomputer.

Wednesday, 24 August 2016

Story Time : Fantastic Rack Mount Mistakes #1

This is the first in a little series, which I'm afraid will have no nice pictures, but trust me, these are all true and if not first hand, at least second hand accounts from trusted sources...

You may or may not have seen a server room in your time, if not then they're basically rooms, generally locked, with at least air conditioning, generally a fire extinguisher nearby and all your company server assets inside.

When I started out in business the server room was a large cupboard with an air conditioner and the 386 based NCR mini-computer inside.  It also had set of aluminium shelves, but it had no rack mounting, everything stood on the floor.

This was true for me, for a few years, the IBM AS400's we moved onto were huge machines on the floor, aside from the colour you felt like you were walking into the set of "Flight of the Navigator"


However, I soon moved over to Compaq Proliant machines and these were rack mounted... I walked into the server room, and took a look at the rack, only to be greeted by the UPS battery pack being at the top of the rack... Yes, the batteries, full of acid, were at the top...

I came out and went to speak to the nice chap in charge, pointing out, as gently as I could, that the batteries may leak, or if there was a fire could be ruptured, and it was not unknown for batteries to get hot swell & burst...

He stared at me, blinking, "What's wrong with that, they're in a case".

And he was damn right, they were... a Plastic case, this particular battery pack had no metal sheeting, no casing other made of metal, just metal corner brackets, it was essentially plastic...And full of acid.

In story number 2, you will hear about the server room fire I'd dealt with the previous year.  But this chap didn't seem to register a problem with having the battery pack at the top of the rack.

I did however lodge my complaint in the service book, and went about my work.  Months passed, and I didn't often go to this site, we moved all the other company assets to rack mounted ethernet switches, a few smaller rack mounted Pentium II based machines and all new desktops.

When one day, a user came to my office door and said they could not get onto the central server, checking my desktop serial terminal sure enough nothing.  Checking a physical VT100 nothing... Oh no.

I call the other site, no answer...

I get a secretary to start calling them every five minutes and I start to check back through our logs, it seems from our end the link was active at 6:03am about two hours before business would ramp up during the day, and the last items were a set of data from the design department at a Derby based location.

So I call Derby, asking "Did your upload complete at 6am?" a little bit of silence and I finally get the right chap on the line, he'd come in very early to get a jump on the day and to the best of my recall he said "No, it suddenly stopped and I could not get back on".

I cut him off as the secretary suddenly pages me (yes, rocking the pager) and I pick up my internal phone to hear I've got the guy from the server site on the line, he's coughing... "What happened?" is my instant question.

"Halon was active when I arrived"...

This is pretty bad, the server room was protected by an automatic halon fire extinguisher, if it detects smoke or heat there is a 10 second alert, the door closes and halon is released... 

Remember the lab scene in Terminator 2?...


A lot like that!... Staff in the room, or entering, need to use a breathing apparatus, which we'd all been trained on.

This guy was short of breath because he'd been in the halon room.

He says that when he arrived (at 7am) the room was in close down, the building fire alarm was going and a fire engine was already present... The security staff had let them in and they'd used breathing kit to enter the server room, to find....  And this was the best he could describe:

"A smouldering brown mess of plastic encasing the top of the Proliant server, stuck in all the front vents and solid into the quick release drive catches, all the network wires above are only copper and the battery pack is black"

I dive in a company car and drive up there, I arrive as the fire brigade are happy it's electrically isolated, that the halon is clear, we've got a guy from the fire suppression system coming to strut his stuff, but in the mean time the company is offline.

So, I set up a 10Mbit ethernet cable from the main input multiplexer, set up a backup system on an AST dual core machine (literally, it was a PC with two 486 CPU sockets) and 16mb of ram... power!

This was to be the tide over machine, we cycled the nightly backup onto that and at around noon, people were working, if extremely slowly.  That machine was our stop gap for the next 48 hours.

So, now to the server room, initially the most obvious problem (because the rack had a glass door closed) was the ceiling and floor... The ceiling tiles were seemingly not flame proof... DOH.... They'd melted at least and there were droplets of melted whatever they were all over the place, blown around by the AC.

The AC itself had valiantly kept fanning the flames, so it had black soot marks neatly blown all down the wall.

And the floor directly around the cabinet base was charred, they were fibre carpet tiles on concrete, but had smouldered...

Our fire plan had fired, and the halon tanks were empty, both had fired, despite perhaps this only needing one... The vent out had worked and we'd used up one of our internal breathing kits, so two spares, one inside one outside.

The room is also electrically dead, nothing is on inside.  We had redundant feeds in from sites, one inside the server room and the one outside down the corridor which the AST machine was now connected to.

We're using torches, so I opened the front door of the server rack, and it's just black, the glass is coated in black, the once cream coloured Compaqs are black, the silver rack arms are black, going up the patch panel above is black but you can see the wires look a lot thinner, reaching up they're charred, the plastic has sloughed off of them and later when I have a step ladder I see it's mostly pooled onto the top of the first computer, but it's also seeped into and set within the vents on top and spilled over into the front of the machine.

Above this is the crispy remains of the UPS, there is a hole about 2 by 3 inches around 8 inches from the left of the case bottom, and I can see into the UPS... It's a charred vision of hell.

We take this UPS out of the rack, as it is the ONLY thing in there that had been put in by the local chap, the ONLY things not on the service list, the ONLY thing I had had an issue with.... And we pop the lid, one battery had gone, catastrophically, it had melted acid out, the batteries lay on their side and so the corrosive acid had eaten the rubber off the outside of the wires inside, shorted and an electrical fire had ensued.

The battery acid had then, xenomorph style, eaten out of the bottom of the case and dripped onto the ethernet patch cables, they'd shorted, got hot, dripped more, the acid had also cut through the plastic on the power cables, more electrical shorts... And the gates of hell had opened swallowing the nice working machine.


The clean up was epic, we lifted the servers out to the cleaner area of the IT room, we cleaned the outside and had a pair of Compaq chaps come and clean them... I think they were called Mark and Jeff, and they did an excellent job.

I ordered new cable and thousands of RJ45 connectors and a crimper, setting about rewiring the room, in a better manner.

And I set the guy in charge of the site, Mr "It'll do" to taking out all the racking, jet washing it, putting it back in, redoing the power lines with a guy from the engineering department and I also had him remove all the carpet and the ceiling tiles, and find a supplier for a proper fireproof ceiling.

I also checked with the fire extinguisher guy, and he reckons that we were lucky the room was an hour outside of working hours, because the roof was just ceiling tiles and should never had been fitted with this strange fire system... Despite his being from the same company!... Basically most all the halon had exited via the ceiling tiles!

Two twenty hour days after all this I had one clean Compaq server, one clean rack and enough ethernet connections to put one patch panel and the server back online, this let our users stop moaning about how slow things were.

The other Compaq, the one closest the mess, was going off to be re-cased, the damage was so bad, but the unit itself worked.

We also now had a new fire safe, as the original had proven to not be smoke proof!

And I had a new certification on the walls being fire proof.

It took three months to recover from this, my end report to the owners, was that it'd cost about £8,500.  Not a bad deal considering the whole rebuild of the whole room would have been £40K+

I'd instigated better carpet, better ceiling, better procedure, and I had the brand new branded, supported, service contracted UPS in the bottom of the rack.

The final wash down, was the UPS which had failed, was known to be bad, three weeks before it'd gone up, the site manager had had it out and replaced two of the four battery cells within with a pair he'd bought cheap, he'd done this himself, I was not even aware, and he'd had the bottle to charge the company twice for the batteries, pocketing the difference.

Embezzlement, bad attitude, bad procedures and simply his considering me below him, resulted in his dismissal soon after.

He was the first person I enjoyed firing, he was the first person I'd actually had to actively fire from a role for incompetence, and he was the last person I let dictate to me where kit in a rack sat.

All for the placement of a UPS in a rack...



(Paul, Andy, Max if you recognise this story, you know you were a big part of that clear up, but you were in London, Leicester and Morocco... I was up to my arm pits in that mess!)

Tuesday, 26 June 2012

Burning up, or shutting down, the Mini-Computer


I've decided today to tell a story, its a true story, and it bares some interesting information and history.  You see, I used to work for a company called "Claremont Garments", at the factory in Selton, and I worked in an IT support Role... I took over the role from a guy called Jim... The last time I saw Jim he fell out of a doorway near drunk when I picked him up for the annual departmental curry... But anyway, I was a student at the time... Studying Software Engineering.

The company ran an NCR brand mini-computer, running serial lines through multiplexors out to the distant sites, using dedicated comms lines.  Meaning in the far off factories there were green screen terminals showing the computer screen.

At the time it was the sunset of the mini-computer era, I don't think very many companies still run in that fashion, though some, like the one I work for now, do still have powerful servers running systems which are really just old software which should have a terminal, but which have been given a GUI.

But at the time, I was introduced to this NCR, and its AST backup, machine I was told it had to have an ALWAYS UP rule, I was shown how to make a call on the very expensive support contract, then told the push button code for the door, and shown inside the cave of technological delights.

It was all pretty old stuff, a wire wrack, with monitors, keyboards, UPS's and other stuff, there was networking switches running down the right of the room, and computers down the left.

But as I'm being shown what each computer does, I notice that in the stream of air from the air conditioner there's black flecks... Soot... I look at the unit, its a nice newish clean looking thing... Where's this black coming from... I glance down, nothing, glance up... There are three monitors on the top of this wrack of machines, the right hand one is on, but looks fuzzy, and the black particles are floating out of it... The ceiling tile is slightly discoloured with heat... That bastard thing is on FIRE!

Well, I shift back, and point it out, Jim to his credit dives in to his elbows and grabs this thing down... As I reach around his arms and unplug it... But now, he's holding a burning monitor over his head, and is stood in the server room... There are two doors to this room, I sling the one to the right open and Jim hurls this monitor over hand into the corridor, where it just sits black specks floating and all...


However, turning back around the chat hired to write code for the mini-computer, Don, is skidding into the other doorway and lands on the floor full length staring at the NCR... "What have you done?"  he cries "It's gone off"...

I look as puzzled as Jim, who retraces his steps... Whilst valiantly saving the server room from fire, he'd stood on a pile of wires, one of them was a coil of power lead, it was the lead from the UPS to the NCR... The NCR had lost power, instantly...

OH SHIT... Don and Jim reboot the machine, something I take care to learn about, but ironically in my whole time there, never had to do.

But then I take a look at the wire Jim "Pulled" out... it has screw fixings, it should have been screwed into the UPS and the NCR, but it wasn't... The person responsible for it not being screwed in, forever after, ridiculed Jim for unplugging the machine... But Jim had stood a good 4 feet to the side of the NCR itself... this chain of catastrophe should never had happened.

However, it did, and instead of Jim being remembered for his efforts to save the place burning up, he was always jibed and remembered for unplugging the NCR.