Showing posts with label services. Show all posts
Showing posts with label services. Show all posts

Thursday, 20 April 2017

Sys-Admin/Dev Ops : Assumption is Danger

As a systems admin, or dev ops, or whatever your job title might be, never ever assume that the person you're handing a system to has a clue.  This might seem harsh, but it's true, and proves itself true time and time again.

"Assumption is the mother of all f**k ups"

About a year ago I deployed a system which automatically sent requests to remote machines (via SMS) getting those machines to report their status or send back error information, but also to gather some basic information.

It has run happily for a whole year, it has been all pretty plain sailing, the hours and hours of work I put into it, to automate it and keep it self-sustained have paid off, zero faults, zero down time, self-regulation is the way forward for me; even if it took slightly longer to put the system in place, it has needed no human input for nearing a year!

However, the unit needed to move, about a week ago, it needed be physically picked up and taken out of my small server room and into the official server room, a dark cupboard basically controlled not by myself or my cohort, but the IT boffins.

Fine, I notified the customers, went off to the IT area, sorted out who I was to hand it to and physically delivered it to the chap, I watched him start to plug it all back together, power, wires, boot, fine....

I assumed he'd do this seamlessly....

Until this morning, well a morning last week, as I post these with a date in the future.  That morning was hell, I walked into a wall of customers not being able to get to their machines, the Easter weekend was looming, performance needed to be monitored, customer sites didn't have regular staff, explaining to temporary cover staff that system would be off was not a prospect I relished. 

To be frank, a lot of flapping going on, more than I expected... IT reported the system back online, but customers didn't stop flapping... Indeed, none of the estate seemed to be able to connect in... 1 hour, 2 hours, I've asked the boffins to check it time and again "It's fine", they tell me.

I look locally, I can't see the controller machine on the network, I can't see it through the remote management console... Where the hell is the machine?

I assure the customers I'll have answers within the hour, I hit social media with the same, this is going very public, and I'm rather annoyed as for a whole year things have run seamlessly; but been ignored, now its offline for a scheduled purpose and everyone is complaining, I do not want my success wiping away in a flood of negative press.

I call the IT boffins... "we'll look into it"... No, no no, you'll get onto it right now, not look, not glance, answers are needed.  Action from you is needed before my Re-Action goes nuclear.

I wait, five minutes, I was willing to give them ten.... My phone rings...

Them > "Hello?"...
Me > "Answers?"...
Them > "Yeah, you know when you brought it back?"...
Me > "The Machine?"....
Them > "Yes"...
Me > "I remember, why?"...
Them > "Well, it has power"...
Me > "Good"....
Them > "Not really"...
Me > "Why not?"...
Them > "Because that's all it has, it's not been plugged into the network"

I hung up.  They plugged it into the network, I had a slew of data come through... The customers were pacified.

I however was not.

I've had an on the spot review, firstly the IT bod who did this was held to account, second I was held to account for not noticing.

In not noticing I admit that having had it run cleanly for a year I had turned off the performance reports and I admitted I had assumed a network machine being handed to an IT bod would be plugged into the network.  People were not happy, least of all me, but that was the fall out.

However, I then had to do a tertiary clean up and after the Easter break I spoke to three of my main customers, trusted operators, the actual folk who should have been using the machines at the remote sites; not temporary staff; I asked them why they had not noticed.  The replies...  "Because it had worked for so long without an issue", "like you make it work, so we just guess it always is" and "we didn't notice it was offline".

They were very much putting everything into my court, assumption on the part of all parties was to blame.

The lessons learned for me are to now keep checking, keep monitoring, use my automation to report status, to fix faults and if human errors creep in, to let me know.

I'm now off to spec up a service I can run on one of my own servers, just to ping the network machine which went AWOL and receive a report from it to let me know what its up to, this might be a bit of python or just bash on a cron task, but it's going to be something rather than nothing.

I will NOT assume again.

Thursday, 7 July 2016

Three Games I Can't Face (Again)

Three games I often think about, but which I could never return are raising their heads with me lately, the first being Eve-Online.

I was invited back by a friend, whom wanted me and my three decently trained (for 2012) accounts to go back in to the game, and run mining operations for him.  If you're not aware, mining in Eve is pretty much staring at rocks, you have to be in a special place, and busy with other stuff to let your game time turn into this monotony.

He assured me there's loads of new stuff, "since you like mined Xel, there's loads"... But my heart just wasn't in it.  I did install the game, and try a free account for like 25 minutes, until I got utterly bored with it and had to stop.

My main problem is probably the style of game play, I loved Eve, I played way back before the new engine.  It was a clicky menu fest then, and it seems twice as much so now.

Though I still love the Armageddon!


The next is World Of Warcraft, I don't often get invited back to WoW nowadays, but when I initially quit I was asked perhaps every other day.  With the release of the movie however, my interested was peaked, and so I took a look... 

What have they done to my beloved Stormwind?

I just can't, I can't face it.  But I look back on my appearance on "Shut Up We're Talking" (which incidentally spawned this very blog) and I remember saying that damage & level inflation would devalue and dumb down the game... Boy, was I right, or was I right?


The third, and final one, is CubeWorld.  I was big into this when it came out, I begged a friend to lend me his account before I bought it (which he wouldn't - *cough* tight wad) but buying into it, was great.  It mixed Zelda with Minecraft with a proper olde tyme Adventure feel.

I had high hopes it would become part of my regular rotation, unfortunately, development seems to have utterly stopped, and it blows in the wind like dust from my gaming bones.


Do you miss an old game?  Do you insist you have quit a game, or stopped subscribing to it?... Let us know what it is in the comments below!

Monday, 9 May 2016

A Golden Rule for IT Infrastructure

Maybe I'm too old?.. Maybe I'm not aware of some other rule... But when I were a lad, and first started out in a professional IT environment, the key was to keep things available and running, and always the same.  Now, the latter item is somewhat of a grey area today, what do we mean by "always the same"?

Well, I'm not talking about the same box, doing the same job, and nothing changing on it... That is very old school and something you have to think about in the sense of "big iron", when you had only one (for it's day) powerful machine, effectively a main-frame, mini or even just a large server.  So, armed with only the one machine, you were forced to think about keeping the status quo, to keep everything the same, everything happy and you paid lots to have people like me, and hugely expensive support contracts to keep that status quo.

That though is old thinking, that is not how modern infrastructure should work, or even be thought of, now today with powerful servers at lower price points, you can think about "high availability".  This is a concept many many older IT bods and even folks in executive level posts are struggling to swallow.

It means they pay out for many smaller boxes, and they virtualize to spread the work between those boxes.

If one box goes down, the virtual machines, or services being run on it, simply up and migrate to another box in the group.  This is the core concept to modern cluster computing and really only depends on the number of boxes you've bought, or hired, and the back-up/snapshot ability of your storage solution.

So we're talking about boxes, running you a SAN solution, we're talking dedicated high bandwidth between the storage and the processing servers, and then many servers booting locally perhaps, to then load virtual machines over the wire/fiber.

What has all this got to do with the status quo?... Well, very simply in this modern infrastructure, no matter where your virtual machines are running, no matter what their services are, if you've not told your customers in the work-place that a change is coming, NOTHING should change from their point of view.

To change anything at this juncture would result in their feeling, and even spending precious time, checking why something their end is no longer working.. When really it's not their end, something changed in your infrastructure.

So don't do it!

Publish changes, push out a change log, or an update mail or even just have the courtesy to tell them.

And worse still, if a customer comes knocking on your IT office door and tells you they can no longer do today, what they were doing yesterday, don't fob them off with phrases like "if must be your end" or "nothing has changed chap".  If nothing changed they'd not be talking to you, customers don't just get off their seats and show themselves as being in the dark unless something is bothering them.

Your role in IT is not just to provide the infrastructure, but to understand your customers, not all of them are moaning, know nothing, morons... Some are, don't get me wrong, some are, and you are paid to mediate with them.  But then things are changing customers notice, and their frustration will ultimately be taken out on you.

Friday, 27 September 2013

Sky Internet Customer Service Sucks

I've had a very busy few days, I mentioned a certain bank dragging their feet, well all is forgiven they've got into gear and are getting on with things.  However, not before I've been accused by the party on the other side of this transaction - a transaction wholly out of my hands - of being a time wasting bastard.  Hey ho.

I'm also investigating issues with the in-laws ADSL, now I hate ADSL, I hated it when it was ISDN and I hated dial-up even more before that.  I have, since 2000, exclusively used cable internet services, they're faster and for the most part more reliable, for instance cable in my area has been fiber optic since around 1996 when it was laid.  Not like the Johnny Come Lately British Telecom offering of "Fiber has arrived".. yeah, well I've had fiber a long long time slow coach.

The issue with the in-laws connection though is a case of the utter diabolical service from Sky, their provider, Sky's service just piggy backs on BT's infrastructure, and its ADSL based... I hate ADSL... I don't have spare routers for them to try, and I don't have filters or anything like that... so I'm struggling to fix the issue, and the reason is that Sky simply put the burden of proving the fault on the customer.  Its not a case of calling customer services and getting actual customer service.

No, its a case of calling Sky customer services and being told to fiddle with wires, filter and even wall sockets like some giant rubix cube and their telling you how expensive it is to involve BT Open Reach into this situation...

Now, luckily Sky customer services were not telling these tales of woe and foreboding to the technofeeble in-laws, they were telling me, and I batted them out the ball park as the bull shit they are.  Though Sky insist the issue is not their end, its with "your" router.

The router which is sat in their home says "Property of British Sky Broadcasting"... yet sky say it "belongs to you" and "you have to buy a new router"... yet the label on the router says "property of sky, do not dispose, not for resale"... its like... erm.. this is your kit, I'm just a customer paying for a service.

I'm tempted to lay the whole statutory rights on Sky here, because in essence the in-laws have been without a service they pay for, for over a fortnight, and all because Sky are arguing, or trying to rope the in-laws into being stuck with them for a further 12 months contract on the internet...

Anyway, I have an ADSL router to try, and I'm going to strongly advise the in-laws to move to a cable internet provider in the area.  Sky suck, their customer service is the worst I've come across - whilst actually sounding legit - they are so full of shit its untrue, they may as well just let this guy do their support... 

http://www.youtube.com/watch?v=byFhCC2pUEU


UPDATE 30/9/2013 - I finally got them to admit a line fault, I did this by borrowing $1000 worth of Cisco professional grade routing kit from work and setting it all up and running my own reverse line test, when the guy (who, give him his due was helpful) on the phone then tried to argue I had them by the short and curlies because I could see them just try to PING the line... Some line test... and I could see the carrier wave of the ADSL coming and going with no more than a 12% peak it was unacceptably low.  Explaining all this worked (well i think pointing out I had Cisco testing kit attached worked) and now an engineer appointment is booked.

But, not before four very technical phone conversations, not before proving absolutely everything in the house was in order and having to borrow professional kit to prove the point, an unacceptably high level of entry just to get a support call booked me thinks.

So I stand by my statement, Sky customer service, your bar to entry is too high, so you suck!  Support shout be "Have you turned it off and on again?"... "yes".... "Right, we'll come see", even it there is then say a £10 call out charge or whatever.

But to blanket say "no" and then only offer to send out a new router IF they sign up to 12 months more contract.  If it were me, I'd tell you to get stuffed after already having been a customer for 10 years.  I don't need to sign up for 12 more months to prove loyal and worthy of support, I already pay my monthy service fee to be worthy of help!