I need to anonymize this code, so we'll be doing it in a pseudo C# style. So one of the last tasks I had at my prior employer was to inherit the entire code base for a project I had been bitting and bobbing in for years, I'd seen this project start, release (many times), mutate and ultimately age.
As I took control it needed replacing, which is a whole other story involving C++ and dragging people kicking and screaming into touch.
This product though was like your grandad, it sat quietly on its own sucking a Worther's original waiting for a war film or Columbo to come on the tele.
The difficulty was the fault rate, between 9 and 14%, of machines were off in the morning, if a pack of updates were ever sent (for content) then that was around 46%... Image the calls there, the service manager and his oppo having to field 46% fault rate because of your update. Indeed on one occasion I remember driving to a customers site and physically handing them a good update DVD rather than our leaving them to wait.
So what was so bad? Well, it all came down to.... Lets look at a piece of code that is seared in my memory:
FileStream file = new FileStream("C:\\SomeFile.txt", FileMode.Open, FileAccess.Read, FileShare.None);byte[] buffer = new byte[file.Length];int bytesRead = file.Read(buffer, 0, (int)file.Length);file.Close();// Do something with buffer to give us a new bufferint newDataLength = 64;byte[] newBuffer = new byte[buffer.Length + newDataLength];file = new FileStream("C:\\SomeFile.txt", FileMode.OpenOrCreate, FileAccess.Write, FileShare.None);file.Write(newBuffer, 0, newBuffer.Length);file.Close();This is part of an update sequence, where the existing file would be opened, the new update delta calculated and it intended to append it onto the end of the file, and this was fine for years, it worked, it got shipped. It went wrong about five years later, can you see how maybe?
A hint is that this was a 32bit machine.
Did you spot it?.... it's line 2...
"file.Length" returns a long, but then all the following file operations work on int. The file started to go wrong after it was two gigabytes in size, because the range of int being 2,147,483,647 if we divide by 1024 three times we get kilobytes, then megabytes, then gigabytes and we see this is roughly 1.99 gigabytes.
But then think about that, this is a 2GIGABYTE file being opened in a buffer in RAM!?!?!?
It just makes a pure RAM copy of itself, then opens the file and starts to write over the original from zero to the end.
YEAH, so it's over writing the whole original file.
It's so wrong in so many ways, the massive buffer, the overwriting of existing data already safe on disk, the fact that this all took time too... this operation happened at a reconcile phase, it was all asynchronous, whilst this system portion was doing this mental tossing about another part of the system had changed the screen... to say "Please Power off or Reboot".
So people did, they literally pulled the power. So they lost their 2gigabytes+ of data, and when these files were getting large they were nuking them by pulling power too!
The solution is simple, open the file for append, or just seek to the end and add the new data on.
int newDataLength = 64;byte[] buffer = new byte[newDataLength];// Get the new data into the bufferFileStream file = new FileStream("C:\\SomeFile.txt", FileMode.OpenOrCreate, FileAccess.Write, FileShare.None);file.Seek(file.Length, SeekOrigin.Begin);file.Write(buffer, 0, buffer.Length);file.Close();This was only part of the problem, the functions using the data from this file took it as a whole byte array, it literally had no way to chunk the file. I can't go into the details, but I had to break that up and start to stream the data through that system which then let me add the resulting new delta array (which was always smaller than 2MB) to the end of the file.
That was only one part of the system which kept be awake, another good one, used a lot was a pattern to also overwrite small files, mostly the json files which controlled the settings. So the users would often turn these machines off by simply pulling the power out of the back.
Whenever it was saving a file it would basically be doing:
File.WriteAllBytes(thePath, allTheBytes).
Yep, it'd just write over the file.
My fix? Simple, when opening the file at a time when we didn't expect the users to just pull the power - or at least it being less common - make a file back up "File.Copy(source, dest)" and these destination files were numbered 1, 2, 3 which we could configure... so sites where we knew they had a high fault rate we could stack up 5 or 7 backups of these files. but machines with a better hardware, or SSD's we'd only need 3.
I don't even think the service manager knew about this "fix".
But armed with these backups we could then leave the original code alone (which was quite convoluted and I didn't want to fix to be honest) but then on next load if the opening failed I'd have it nuke the back up it just took, then use the last best aged backup. And if now there were more backups then we should have we'd delete the oldest.
Settings didn't change very often, but this did let us solve this issue.
The final worst piece of this system was the licensing system, which used a USB connected smart card reader, and a custom decrementing secure card format to license the machine time. This was fine for years, it used a nice Gemalto reader and cards, and all was fine in testing.
The machine tested the card whilst in operation once every five minutes, so no big deal. When in service mode it checked the card every 10 seconds to update the license level display, but the service mode was never intended to be left for more than a few minutes.... So what happened?
Yeah, a customer opened a machine and left it open for a week.... And their machine went out of operation, when we got this particular machine back I just opened the door, took the card out and pointed to the literally charred burned back of the smart card chip... It was a white plastic card, and the back was deformed and light brown... I did chuckle, sucked for the customer, but we never worked out why they had the door open in service mode for so long; they weren't meant to.
But worse that that isolated incidence was a new tranche of machines being released in 2015, suddenly all had faults, there was machines out of order, machines not allowing play, machines rebooting... Nothing seemed to clear them, and some were reporting "Out of Licensing"; despite people having paid for brand new cards.
They were issued a new card... The old cards came back, were reworked... so randomly once working sites got either a new card or a reconditioned card from any other random site.
New machines had a new brand of card reader, old machines had the Gemalto. New cards were all these new brand of card, and the old cards were the white gemalto ones... this mix just went on... and soon we had a rising fault rate.
The diagnostic view was at first a little mixed, sometimes a new reader was fine, sometimes a new reader was bad... all customers reported "my new card", they had no idea that the brand had changed under the hood... and in fact nor did I.
You see to save a few pence per card (12p per card to be precise) they hadn't gone with the grand 34p GemAlto cards, they'd gone with 22p Chinese copies... Inferior copies as it turned out, they had around 1/8th the life span, so over time ALL these "new" cards failed.
But then, in the GemAlto reader they were all fine... So the new reader?... Oh that was ALSO a cheap Chinese knock off, and these things had strange problems, I suspected sometimes they were putting the full 5V USB current through the cards (rated at 3v) killing them. And was proven right.
This unholy quartet of product caused havok, but I eventually found that new readers could kill either new or old cards, they had to be recalled... Then new Cards could die randomly in even old reliable readers, they had to be recalled. Which means we slowly struggled to find old readers and old cards.All of this was a purchasing foul up, unfortunately managers saw it as an engineering problem and so one had to code around poor hardware.The first thing we did was add two toggles, one for "old card" which I could detect from the card chip type being read on reader access. This slowed the reading of the card down... form 5 minutes to every 30 minutes, so we ricked giving customers longer before an unlicensed machine went out of action, but it was accepted to give us a much longer read life for the card cell.Then we deferred the first read of the card, on boot up we literally leave the USB device completely alone, let windows start and everything settle on the desktop driven system. And after 5 minutes we'd start our licensing check. It was accepted that a user could technically receive 4m59s of unlicensed use and then reboot to get more time, but that would be a little impractical in this usage scenario.Doing these two things we could just about use the new readers...But the new cards were just so utterly terrible, we did eventually have to buy better cards. I never heard if there was a refund on the originals, but I can assure you my time along cost more than the £120 they saved going with these cheap cards.
A blog about my rantings, including Games, Game Development, Gaming, Consoles, PC Gaming, Role Playing Games, People, Gaming tips & cheats, Game Programming and a plethora of other stuff.
Showing posts with label USB. Show all posts
Showing posts with label USB. Show all posts
Saturday, 24 April 2021
Bad Files and Smart Cards in a Project from Long Ago
Tuesday, 2 March 2021
Linux - Detect, Format & Mount USB-Flash Stick Natively
I've had yet another drive fail in one of my old Linux servers, this was an old 2.5" mechanical just used for booting... so I'm able to go in; just about; and copy all the data off of the machine.
However, the USB-flash drive I chose as my life-boat was being a bit of a pain, here's the commands I did:
- ls /dev/sd*(and note down the replies)(insert the USB-Stick)
- ls /dev/sd*(note the label of the new drive, for me this is /dev/sdc)
- sudo fdisk /dev/sdc(delete the existing exFat partition)(new partition max size - of Linux type)(write & exit)
- sudo mkfs.ext4 /dev/sdc1
- mkdir ~/external
- sudo mount /dev/sdc1 ~/external
- sudo chown <myUser> ~/external/
- rsync -a -v /myData/ ~/external
I could "sudo" rsync and save the need to chown, but here you go.
I'm out, more from the HMS Rodney build soon!
Wednesday, 3 January 2018
Ultra Cheap ZFS Array
Make your own ZFS array (mirrored) with USB Flash drives, for cheap!
Since this... interesting post of mine... has only about 10 views, and my tech items usually get a few hundred, I figure somewhere along the lines it got trampled by my silly New Years post....
Sunday, 31 December 2017
Using Flash Drives in ZFS Mirror
This post comes from an idea I had to allow me to easily carry a ZFS mirror away from a site and back again, we didn't need much space - only 5gb - but it had to be mirrored in triplicate, one copy to stay locally, one going into a fire safe on site and the third to be carried by the IT manager off-site each evening.
The trouble? A near zero budget, so for a little over £45 we have a 14GB ZFS mirrored pool, across three 16 GB USB Flash drives and one three port USB 3.0 hub.
It was perfect for the task at hand, extremely portable, and cheap, I thought the same approach may help anyone trying to get to learn a little more about ZFS, a student or even someone using a laptop as a small office server - as the laptop literally has its own battery back-up system built in!
It's not the fastest solution, its in fact extremely slow, but as an entry step it's perfect.
See the full video below, throughout the commands I list were in use...
Commands:
Listing Disks by ID...
ls /dev/disk/by-id
Listing Disks to a file for use in a file script as you see me using...
ls /dev/disk/by-id -1 > disks.txt
------------------
To install ZFS on Debian/Ubuntu linux:
sudo apt-get install zfsutils-linux
------------------
To remove & purge ZFS from your system:
sudo apt-get purge zfsutils-linux
(and you will be left with "/etc/zfs/pool.cache" to remove or back up yourself).
------------------
Command to create the pool...
sudo zpool create <Name> mirror <DiskId1> <DiskId2> etc...
The name we had here was "tank", if you already have data on these disks you will need to add "-f" to force this change through.
------------------
Command to make a file executable - like our sh script:
sudo chmod +x <filename>
------------------
Zpool Commands:
sudo zpool status
sudo zpool import <name>
sudo zpool scrub <name>
sudo zpool clear <name>
You will want to "import" if you completely remove ZFS or move one of your sticks to a new machine etc, simply insert the disk and import the pool by name.
Scrub will be used whenever you return a disk to the pool, remember the point here is to allow you to replicate the data across the three sticks and be able to remove one or two to safe keeping, be that an overnight fire safe, or taking a physical copy with oneself.
Clear is used to remove any errors such as the Pool becoming locked out for writing - which it may if a drive, or all drives are removed - you simply clear the current problem with any pool.
Summary: Remember this is NOT the optimum way to run ZFS, this is actually extremely slow, you are replicating each write over your USB, you can only cache so much in the RAM, but it is not a performance piece, this is about ensuring one replicates data for safe keeping, a small office or your dorm room server setup could be completely provided by a laptop in this manner, it has it's own battery backup, it is quite (if you get the right machine) and really this is a very cheap way to play with ZFS before you move onto other bigger hardware options. Plus, I find the best way to learn about technology is to break it, even a little, and so constantly breaking down your pools by pulling USB sticks out of them is an excellent opener to recovering your pools. Play about first, don't put anything critical on there until you're really happy with the results.
For an excellent post covering creating ZFS pools, cheak out programaster's post here: http://blog.programster.org/zfs-create-disk-pools
And for the official ZFS documentation you can check things out with oracle here: https://docs.oracle.com/cd/E26505_01/html/E37384/toc.html
Oh, and Happy New Year... I guess I made it to 100 posts this year...
Monday, 2 March 2015
USB 3.0 Storage Project
This weekend has seen the final parts of the server infrastructure I had powered down, and this leaves me with a gap in my storage needs.
I carry two mechanical harddrives, old laptop 2.5" drives, power by 5v with me most of the time, but these are getting old (and they were already old before I pulled them out of the retiring laptops they served) they're also quite slow and I have no back-up for them.
This is a situation which can not continue on my conscience. Therefore, I've just bought a pair of 32GB USB 3.0 flashdrives, a cheap USB 3.0 hub and a hot glue gun.
The project is going to be to strip an old 3.5" hard drive, even dremel the middle out of it iff needs be, and after stripping the hub and sticks of their plastic shells, mounting them inside the old hard drive shell. Screw the shell back up and hopefully I have a 64GB storage space, which is none-mechanical, and which I can just plug-in and go.
Initially this is going to be for moving some recorded files around and perhaps holding a git repository, or even my old SVN respoitories for checking or supporting projects.
If it's a successhowever, I maybe looking at a nicer strip USB hub, and slowly building 64GB sticks at a time (£10 a pop) to build up a big drive, with a full back up script etc.
I may also be getting together a USB 2.0 hub and a couple of 8GB flash drives to act as a storage backup on the web-server, this will be for the Elite Trade data to be dropped off on.
Or this lad has the idea...
Labels:
elite,
elite Dangerous,
elite trade data,
hardware,
internet,
myself,
project,
storage,
USB,
USB 3.0
Friday, 11 October 2013
PS3 Firmware Update
Right one of the things I have to do with the PS3 rebuild is check them out with a known working hard drive.
So, with a 2.5" SATA hard drive ready - clean - here's the steps I'm keeping to...
1. Format HDD as FAT32.
2. Create a folder called "PS3" on it.
3. Create a sub folder called "PS3/UPDATE".
4. Download the latest firmware from www.playstation.co.uk.
5. Place this extracted firmware into the "PS3/UPDATE" folder we've got.
6. Insert the HDD into an external USB caddy.
7. Plug this into PS3 and power through...
8. Place the HDD from the caddy into the PS3 itself.
And it anything goes wrong I gotta go back into the Recovery Menu...
On the reassembly front I'm still waiting for the Artic Silver paste to turn up.
Subscribe to:
Posts (Atom)