Sunday 4 November 2018

Server Crash... Bad Memory

I've spent about three hours checking and sorting out the machine which went down, and come-what may I found problem after problem, but not with any of the software.  If your software checks out then there's an obvious other place to look... Your hardware.

My first step was to remove the drives, specifically my ZFS devices.  No difference, everything still broke.

I then set about systematically removing one piece of hardware at a time, first my quad ethernet card, no difference, then I removed the added dual SATA card still broke...

Finally, I removed all the memory and inserted just one stick...

For the first time the firmware of the motherboard reported something had changed (at boot) it knew the memory has reduced from 8GB to 2GB.  But then the grub boot was really fast, it came up and into the system very quickly.

Now, I had been getting MySQL Error 2013 for nearly ever query or SQL dump, with 2GB of RAM this didn't happen, sudo and all the programs now work again... I can therefore only assume one of the other sticks of RAM is corrupt in some manner.


So with a clean boot...


I was able to start pulling the data out...


I ran the server on soak in memtest for an hour and still no problem, there was ONLY a problem after MySQLd had started and 8GB of RAM was installed... Time to bin some of these sticks.

No comments:

Post a Comment