Showing posts with label processor. Show all posts
Showing posts with label processor. Show all posts

Sunday, 3 May 2020

It Finally Happened... Dead CPU

It's happened, happened in annoyingly mundane circumstances, but happened nonetheless... I've killed a CPU.

The first CPU I've ever ever killed, and I've toyed with and worked with CPU's since about 1994, juggling them and building systems, basically as soon as I was introduced to modular PC's at college I was playing about with them fitting parts and drives and replacing things in the great beige boxes.

So what was I doing to finally kill a CPU?

Well, before I explain, let us just lament the chip, it was my Core 2 Quad Q6600, a chip I paid release week prices for, which served me throughout my time playing Eve-Online and World of Warcraft, the chip on which so many gaming marathons in Day of Defeat was carried out and the first chip I really played about with understanding the out of order execution of the new Pentium architecture Intel had foisted upon us (and which I'd tried to ignore for many moons).

Late July 2006 it arrived, I installed a then whopping 1GB of RAM into it and Windows 98 was stuck on it until November when it received Windows Vista, and it worked a charm, equipped with a further 3GB of RAM to a total of 4GB.  Two GTX 8800 graphics cards in SLi it was a beast in it's day.

It led a long and fruitful life in gaming and productivity for me, and so last night 2nd May 2020, a full fourteen years of service later it has gone to silicon heaven.

How did it die?  Right, well, I've been setting up a CCTV system, the neighbours continue to be a source of perturbation for us.  So I had the Q6600 set up in a case with 4GB of RAM streaming to YouTube, however it ran so hot, like a really really hot.  So the plan was to take it out that cramped case, put it into a Bitfenix case with a large 775 cooler.

I get it on the bench, test boot, all fine, remove the ssd and stow it and get to work, all the power out and off, all the cables out the way, unscrew and lift the mobo out, I get my test bench PSU and test boot to the BIOS, all fine.  But I can hear a loud noise, like a whine... I figure it's the power supply, so swap to another power supply, and still hear the same noise... Ohhhhh... what the heck is that?

I reduce to the minimum RAM, and boot... Nothing, what happens is the fan spins a moment then off.... Spins, then off... Spins then off... Oh oh...

I changed the RAM, so change it back... same thing. spin, stop, spin stop... No post.

This whine is getting louder each time I turn on....

I have no idea what it is, I'm suspecting a capacitor or coin whine, I'm suspecting the power supply, not the board.

So, I remove the cooler, and swap a different socket 775 chip into the socket, boots to bios fine, no problem.... Oh oh.... I'm suspecting the CPU is bad.

A visual inspection, it looks fine... I trust the Q6600....

I put it back in, power on again and the whine reappears but instantly goes away, and I smell burning.

Power all off, pull the chip... and sure enough one of the underside surface mount components has gone, I can see its blistered and bubbled up... That CPU is dead, at least to me.

I have no idea what's cause this, I suspect it was age combined with being in this constrictive case and getting so very very hot.

I've decided to retire all my core 2 based machines, I have (well had) three... The Q6600, a X5472 and the wall mounted PC.... The wall mounted project is a little difficult to directly change, I may continue as it is, but the rest of them are to go, they're running too hot and too inefficiently, a now old sandybridge chip will be far more energy and performance efficient.



Thursday, 28 November 2019

From intel to AMD...

Well, that just happened... I bought a Ryzen CPU...

That may not seem a huge revelation, but let me be clear in my PC's I have always (well nearly always) had Intel CPU's.  But the current 3rd generation of Ryzen has Intel on the ropes and I have to get moving with my new PC build.

And I've gone in what may seem a strange direction.

Initially I thought about third gen Threadripper, but I really can not justify the expense, most of my work is done on servers and though I could bring it all onto my workstation for compiles (like LLVM) but really its a convenience and not worthhy of nearly £2500 (after a processor and motherboard).

So what did you do Xel?... WHAT DID YOU DO?

Well, I've gone with an AM4 socket motherboard, a very good AM4 socket motherboard, and I've gone with a 3rd Gen Ryzen Zen 2 architecture processor, but perhaps not the one you'd expect.

You may expect me to have gone with the Ryzen 9 3950X, and you'd be right sometime next year, when they're properly out and in stock and prices have homogenised some.  But today, pre-Christmas, they're like rocking horse shite and costly.

The processor I've gone with then is the Ryzen 5 3600X...

Yes, it's Zen 2, yes it's AM4... Yes it's only 6 cores and 12 threads like my current workstation machine.  But it was only £200.  It comes with the stock cooler, so I cam practice with that before delving into fitting the AIO.

I've gone with an Asus ROG top tier motherboard for this class of processor, and I plan to let it go up to the 3950X with it's 16 cores and 32 threads sometime next year.

Memory, I've gone with two sets of 2 x 16GB Corsair Dominator RGB.  For a total of 64GB of RAM, which is a massive upgrade and maxes out the motherboard.

For that's the big thing I'm giving up, if I had gone with the threadripper3, then I'd have automatically had access to double the RAM slots (8 to be precise) and 128GB of RAM is common as a maximum on the X399... But here on the X370's it's usually 64GB (though some are 128GB).

The motherboard, RAM and processor all come from Amazon, for (to me) the princely sum of £700.  So the whole new machine, with the PSU, AIO, Case and storage I already have has topped me out at £1,100.

Expect build videos and tech tinkering footage soon.

Tuesday, 21 June 2016

Project - Socket 775 to Socket 771 Xeon - Update

As an update to my original post, I got another socket 775 motherboard and took a video of how to cut the tabs off the socket, making it able to accept a socket 771 Xeon, but doing so with the original chip in place, to protect the socket pins.

Check it out here:



Thursday, 14 April 2016

Project - Socket 771 to Socket 775 - Xeon Conversion

I've been conducting an experimental project, one I've seen all over the net by others, but which had lots of different information... This is the conversion of a socket 775 motherboard to a socket 771.

First of all, why?... Why would we do this?... Well, the socket 775 is a commercial socket, supposedly sold to us mere mortal customers who buy one machine and one CPU at a time, and the motherboards and processors in the class were/are quite expensive.

We're talking about the Core 2 era, Celeron D, Core 2 Duo, Core 2 Quad.  I remember the Core 2 Quad machine I put together was really rather expensive at the time.  So, between six and eight years on, we're retiring those machines; yet they cost us a lot of money, and despite depreciation rates we users can still make use of these machines.  They can be useful as render farm nodes for 3D or movie work, they can be used as servers, to host basic information, or upload/download points, even as firewalls.  All roles they can fall into easily.

I personally am going to be using the machine I've got as a quiet webserver, retiring a venerably serving Pentium 4 Prescott for this old Dell machine.

So, what is the base machine?

Well, it's a franenstein, the in-laws have had me build them a new machine, so they had an Intel D31PR motherboard holding a Celeron D 450, and one gigabyte of RAM, a totally unhelpfully slow machine.  Even they noticed it was extremely slow.

From my spares they had enough parts to basically rebuild their machine, which I did, and it left me with their old motherboard.  I wanted to upgrade the processor, expand the RAM and add a RAID array controller card, but my budget is extremely low, we're talking £20.

Well on Amazon, I can get a RAID controller card for £13.. This left me £7... Hmm.. Luckily, the IT Department at work were able to donate to me some old DDR2 RAM, so I had the maximum 4GB the board can handle.

£7... Upgrade the processor?.... A challenge... Ebay... Core 2 Duo's and Quads, going for over £25 a pop, most of the Quads were going for £30+.  Way out of the budget.

But there were dual core Xeons for around £4... And I saw this hack out there on the wires, so I set about working.

The first step is to strip everything down, clean it perfectly, and get a scalpel.  The first part of the modification is to remove the tabs from the processor, these tell the user which way to orientate the CPU for insertion, they do nothing else... A consumer CPU is orientated horizontally, so there are tabs top and bottom to stop you inserting the wrong CPU.

And the Xeon has gaps left and right, meaning it'll bounce off these tabs on the socket 775.


Taking the scalpel, I started to cut the tabs, now I DID THIS WRONG!  A much better approach is to leave the current CPU in there, with the tabs engaged into the socket 775 CPU, and then cut between the CPU and the edge of the socket.  So the CPU acts as a guide and the delicate socket pins are protected out of sight below the CPU.



And clean the cuts you make up.



Remove any debris...



Then you need to go back to ebay, and buy a Xeon Socket Modification sticker, this is a little sticker, which will cover two rows of connections on the bottom of the CPU, it will allow most of them to pass through the plastic, but two pins are headed with a little connector, and behind the sticker these connectors actually swap the two pins over.

So, two pins, and the orientation, that's all that's different about a Socket 771 and a Socket 775 CPU.


The stickers are bar shaped, so they indicate which pin to swap, but lay the CPU down with the notches to the top, and from the bottom count 10 connectors from the bottom right, moving left... Voila, stick it down carefully.

Insert into the Motherboard socket now, so the notches are to the "top" of the socket, add the heat-sink assembly, and build it back up on your work bench.


Now, some videos and advice says you need to go to sites, and download patches for your motherboard.  During my project here I've found most Intel brand motherboards do not need any patching, only third party boards.  It seems Intel include all the microcode for all their processors (this is only a guess, I have no proof other than using five different intel boards, and two none-intel boards and always having to patch the non-intel branded ones, whilst the intel ones just work).

Then powering on...


It worked, I've gone from a Celeron D 430 to a Xeon 5130.  They're very similar processors, but the Xeon has dual cores and a much faster FSB.



My YouTube play list, for my crappy videos covering this project can be found here:

Friday, 30 May 2014

Virtual CPU - Signed Addition Clean Up & ROM Discussed

In today's CPU post I want to just clean up the signed addition example, we've covered the electronics but I've had a couple of messages asking how I might integrate switching into the CPU.

Well, simply for our Virtual CPU I'm going to include the signed addition based on the signed mode flag... We already had this signed mode flag in the CPU, and we default the flag to false or "off".

So that is a simple "if" statement within the "Add" function.

To integrate the switching we invent two new OP Codes, one to switch into Signed processing and one to switch to Unsigned processing.


I'll leave you guys to think about how the programmer has to remember which mode they were in, and hence what the bit patterns they have represent.

I'm also going to leave multiplication whilst in Signed Mode as an exercise for you to address yourselves, if you want to mail me your solutions, I'll happily take a look (if I find a minute).

So our op codes now run from zero to twenty-seven.  So with 28 operations what could a machine do?

You might think not very much, but the real 4004 (though not yet the same instruction codes as our virtual code) operated with just 46 instructions total, more than our code at present, but still not a lot.  Intel have kindly published scanned copies of their original datasheets and so we can peek into the depths of their instruction set here:


Numerically we can see almost immediately their instructions 2 and 3, are about Fetching... Fetching Immediate and Fetching Indirect (from ROM)... What is Fetching?  Well, in other machines, many assemblers and in our Virtual CPU the concept of Fetching is called "Loading" and we Load0 and Load1.  Both those instructions load from memory into the CPU, this is "Immediate" it immediately moves a value from volatile storage into the processor.

Indirect for our CPU would actually be the main program loading a program from a file, the file is our ROM or non-volatile storage and we load it into the RAM to use it.  However, we don't fetch from ROM.

I had been asked to add a ROM to the Virtual CPU, however, all a ROM is is addressable memory which can't be changed, so if you wanted to create a ROM yourself you can, create a byte array in your program, load from a disk file, or just insert data into the array upon construction.

And then add two new Op Codes you want to fetch from the ROM.  You can then add the ROM to your CPU as a reference...

I hope this gives you some ideas and you go a head and try to write a small ROM.

Next on our agenda will be "Interupts"... Stay Tuned.

Wednesday, 21 May 2014

Virtual CPU - Signed Addition & Endianess

From yesterdays post then, we should have learned something and perhaps even gone to look for a solution, you may have even coded a solution into the CPU code we're working on...

The solution I'm going with however is a total cheat, I'm going to add the Cout of the last adder back onto the result as a single full-adder...


Essentially we add the carry out into the first bit again.  But we only want to do this when using a Signed value... so we'll cheat and use out "Signed" flag to do this new function or the original function... Lets get on with creating "AddTwoSignedBytes":

void Electronics::AddSignedTwoBytes (
byte& p_Register0,
byte& p_Register1,
byte& p_Result,
bool& p_Overflow,
const bool& p_Debug)
{
bool l_CarryIn = false;
bool l_CarryOut = false;
bool l_Sum = false;

// For each bit we need to mask the
// right most bit out of the register
// meaning, we start at 00000001 and
// for each loop move the register
// so the bit we're interested is over
// the 8th position.


// Our mask never changes
byte l_mask = 0x01;

// For each bit we run the masking 
// then adder and handle switching
// the result into the register.
// You can find more efficient ways!
for (int i = 0; i < 8; ++i) // 8 bits in a byte
{
if ( p_Debug )
{
std::cout << "Cycle: " << i << std::endl;
std::bitset<8> msk { l_mask };
std::cout << "Mask: " << msk << std::endl;
std::bitset<8> reg0 { p_Register0 };
std::bitset<8> reg1 { p_Register1 };
std::cout << "Register 0 [" << reg0 << "]" << std::endl;
std::cout << "Register 1 [" << reg1 << "]" << std::endl;
}

// Get the A & B bits by shift & masking
// the register
bool A = ( ( ( p_Register0 >> i ) & l_mask) == 1);
bool B = ( ( ( p_Register1 >> i ) & l_mask) == 1);

// We have the carry in and the A & B now, so
// we can call the adder
// Because the Carry out, and the Sum, are separate
// in our code here, we don't need to alter "reg0" or
// "reg1", we can just logically add the bits set
// into the p_Result below!
Adder(A, B, l_CarryIn, l_CarryOut, l_Sum, p_Debug);

if ( p_Debug )
{
// This should be a value from our Adder trace table!
std::cout << "Adding: " << A << " " << B << " " << l_CarryIn << " | " << l_CarryOut << " " << l_Sum << std::endl;
}

// The carry out simply becomes the carry in
// I'm sure you can see one way to optimise this already!
l_CarryIn = l_CarryOut;

// Now the register change based on sum, but
// we also output the binary
if ( p_Debug )
{
std::bitset<8> resultBefore { p_Result };
std::cout << "Result Change: " << resultBefore << " -> ";
}

// Now the logic
// Now instead of pushing the logical
// summing into "Register0" parameter,
// we push it into the p_Result parameter!
if ( l_Sum )
{
// Mask is shifted, and always 1 in the i position
// so we always add a 1 back into the target
// register in the right location
p_Result = p_Result | ( l_mask << i);
}
else
{
// We know the mask is ON, so inversing it and moving it
// will give us an always off...
p_Result = p_Result & ~(l_mask << i);
}

// The register changed, so finish the debug statements
if ( p_Debug )
{
std::bitset<8> resultAfter { p_Result };
std::cout << resultAfter << std::endl;
}
}

//======================================
// Add the carry out to the first bit again
bool A = ( ( p_Result & 0x01) == 1);
// Take the first bit
Adder(A, l_CarryOut, 0, l_CarryOut, l_Sum, p_Debug);
// Now the logic
// Now instead of pushing the logical
// summing into "Register0" parameter,
// we push it into the p_Result parameter!
if ( l_Sum )
{
// Mask is shifted, and always 1 in the i position
// so we always add a 1 back into the target
// register in the right location
p_Result = p_Result | 0x01;
}
else
{
// We know the mask is ON, so inversing it and moving it
// will give us an always off...
p_Result = p_Result & ~0x01;
}
//======================================

// The final carry out becomes our
// over flow
p_Overflow = l_CarryOut;
}

So with this code, we need to test the function, lets add a new test function:

void Electronics::TestSignedAdd()
{
byte l_A = -127;
byte l_B = 7;
byte l_result = 0;
bool l_Overflow = false;

AddSignedTwoBytes(l_A, l_B, l_result, l_Overflow);

std::cout << "Testing signed add:" << std::endl;

int l_v = 0;
for (int i = 0; )

std::cout << "(" << (int)l_A << " + " << (int)l_B << ") = " << (int)l_result << std::endl;
}

Now, before we run the code, what do we expect to see?.. Well, we expect the value of -120, which has the the binary pattern 10001000.

Lets run the code and see...


What the hell just happened?... 129 + 7... That's not what our code says... and the answer is 136... What is going on!??!?!

Calm down, calm down, everything is fine... The binary pattern of the result is correct... see...


So what is with the values we see on the screen, if our register holds the binary pattern for -120, our result...!?!?!

Well, the signed binary for -120 is the same patter as the unsigned value 136!  Its as simple as that, our CPU is working its our C++ which has thrown us a curved ball.

The cout stream took the byte we send and converted it for display, but the byte itself has no knowledge of signing, it is infact an unsigned char when we defined it as a type.  So the binary might be perfectly fine, but the interpretation of that binary is wrong.

This is a case of being careful with how you test your code, and is an example where at least two tests are needed to confirm a result, never take the result of just one test as canonical.  Always try to find some other way to test a value you calculate, or validate your input, or put a bounds around your system.  Because when something goes wrong you always need to second check yourself before complaining to others.

In the case of this code cout is showing the unsigned values, and that's fine, we can ignore it because to get the true binary we can just use bitset...

#include <bitset>
std::bitset<8> l_binary (p_Result);
std::cout << l_binary << std::endl;

This is a lesson to learn in itself, always to check & recheck your code.

But now we have to think about the reprocussions for this in our CPU, even if we've set the signed flag the data are being stored unsigned, the memory is storing the values just as patterns of binary... And this is an important thing to keep in mind when you're programming, when you are working with a CPU, how the value is expressed is more important than how it is stored, because (hopefully) the binary representation is going to be the same all the time...

OR IS IT?

Unfortunately not, our binary we've dealth with so far is what we could call "Little-Endian" that is the lowest value assigned to a bit in the byte starts on the right, and we read the byte from right to left.  Essentially the opposite way we would read this very English text...


If we read the byte the opposute wa around then the values would be reversed:


This is called Big-Endian.

Intel processors have pretty much always been little-endian, whilst other firms have used big-endian, notable platforms using big-endian processors are the Motorolla 680x0 family of CPU's.  Yes, those of Atari ST's and Amiga's, the original Mac.. They all had bit-endian CPU's.

Some said this set a gulf between the two, and emulating between the two systems is very time consuming, because to emulate a big-endian processor on a little-endian machine used to mean a lot of overhead in converting between the binary representations.

Our CPU is going to suffer from this problem, because we've built it, and its adder to use little-endian principles, e.g. we start the adder loop from 0 to n-1, where as for a big-endian machine we'd want to start the adder loop at n-1 and go down to 0 to complete an addition.

A challenge would be to go back over our whole CPU and convert it for Endianess, making it a generic, configurable hardware implementation of a generic 8bit logical operating unit... I'm not going to do it, I'm just here to guide your experience.

Tuesday, 20 May 2014

Virtual CPU - Adders & Signing Discussed

Let us review the physical electronics our code emulated for addition, we created the function "AddTwoBytes" which in
turn used "Adder", the adder being the code:

Sum = Cin ^ (A ^ B);
Cout = (A & B) | (Cin & (A ^ B));
This is of course the code representation of electronic logic gates "AND" "OR" and "XOR", we could have gon further and rather than use "XOR" as a single operation we could have broken it down into separate "AND" and "OR" gates itself.  This is how the first computers worked after all.  A good electronics primer is a good place to start looking at logic gates in more detail.

But the use of "XOR" as a single unit, rather than a more complex set of other logic gates is what we programmers would call encapsulation.

We Encapsulated this whole logic into a call called "Adder" and we encapsulated its use to add each bit of two bytes into yet another function.

Luckily electronics engineers have been at this encapsulation lark as long as us programmers, and so instead of representing the adder logic like this:


They've gone a head and made the Adder look like this:

And then if we think about wiring each bit of two bytes through adders, each adder passing its carry out to the carry in of the next we get this wonderfully complex diagram:


Do not be scared by this, yes I drew it by hand so there maybe bugs, but all I want you to gleam from this is how complex the electronics are, because remember this mass of wires and adders and inside each adder the logic gates equates to the simple loop in the "AddTwoBytes" code!

This is one of the reasons many people emulate electronics or CPU's or just this kind of gated logic in software, because creating the hardware can be so much more complex, costly and hard to get right first time, but code we can change with a wave of the hand.

This representation, wiring each bit from the registers, is purposefully complex however, and there are other adders you can look at.

So how does all this get us signed numbers in our CPU?

When we enter signed mode in the CPU we want to consider the top most bit of our bytes as a sign, when this bit is off (0) then the number is a positive, when the bit is on (1) then the number is negative.

The flag doesn't really mean "and now this is negative", it is actually that we change the meaning of the value, so the top most bit changes from having a value of +128, to having a value of -128.

Hence the binary, 00000111 is still 7... but 1000111 is now ( -128 + 4 + 2 + 1 ) holding the value -121.

Back to our electronics then, if you have the top most bit set and you over flow, do we want to just throw it away as an error?... Well no, because it maybe that the number has just gone from being a negative to a positive, or its gone from a positive to a negative... So we want to carry the overflow back to the first adder...


But, of course this won't work, because you can't have an input to a process which is that very same process's output...

So what do those sneaky electronics chaps do?

Well... I'm not going to tell you... Go find out...

Wednesday, 30 April 2014

Writing a Virtual CPU in C++ (Accumulators & Multiplication)

This post forms part of a series, you can start at Part One here.

First things first, armed with our Electronics code, containing our Binary Adder in C++ we can go to our CPU and change the arithmetic logic add function for our new "AddTwoBytes" electronics function...

We can also, for the first time set the Overflow flag!



We saw our first program ran once again, try out some more yourself...

However, we're still very much limiting our CPU, and before we can go much further we need to introduce to ourselves the idea that the CPU, rather then using a reserved byte in memory, or the two registers we already have it performs arithmetic into an intermediary location, a temporary store if you will, this was a natural evolution of real processors, they moved from having very few registers and using the memory to store intermediate values to using new internal registers.  As the electronics got cheaper and the processors for creating chips streamlined and one of the first things added was this temporary location.

So in our code where we see:

Register0 = Register0 + Register1;

Really the register has no idea about overflows it just holds a binary number, now we've included out Binary Adder we get the overflow, so really storing the resulting value back to register 0 is a bit of a fudge.

The real location it should go to is called the Accumulator, in fact when the Accumulator was added to processors it gave rise to a whole new way of thinking in computing, and some even call it "Accumulator Based Computing" or ABC.  I was very briefly introduced to this during my A-Level Computing course in 1994, however 20 years on I have yet to meet a recent software graduate who realises there was a time when we didn't store intermediates in the processor.

For using an Accumulator has become so ubiquitous as completely block out the idea of labouring saving values back and forth with the memory as we have been doing.  Indeed my wishing to highlight that using an Accumulator is not the same as storing to a register then saving to memory has been the reason behind our labouring over our registers and reserved memory!

To add an accumulator however we need a new specification for our CPU...


You can read more about accumulators and their history here.

So now we include the Accumulator, and fix the bugs I made by making a mistake in the last video :)


Armed with an accumulator, and the Overflow flag working, we can now implement our Multiply differently...

Clear Accumulator to zero
for each time add value to accumulator
if overflow halt

This is not an unreasonable implementation, it is also a lot shorter already, you see we're still building more and more into the CPU, and again because this is something which could be done by a program adding this is a complex operation, so our CPU is a CISC architecture.

However, we still need to store the count of times through the loop in memory and swap it back and forth, so we clearly need a new register in our CPU another accumulator, but one for counting iterative processes... How does a real processor handle this?

Well, the processor contains a Complex of arithmetic operation modules, and many of them operate in different ways, if it were to implement the Multiply program in the CPU itself, storing the bytes of instructions to carry out this would be code within the processor itself.  Code inside a processor like this is often called "Microcode", however, what we're after is not microcode because we know its slower, what we want is to use the accumulator for the cumulative addition of the multiply, and then we want to count how many multiplies into another register...

This use of iterating (repeating) additions to achieve a multiplication is quite old, but we're all about building up knowledge... so on our processor we need a new counter... And that is exactly what I'm going to call it.


Now we can add some new op codes, lets add them to clear the accumulator, load the next value in memory into the accumulator and clear the counter...


With these our multiply is going to contain microcode to clear the counter, load the two parameters into registers 0 and 1 then perform a loop.  The loop we're going to perform is summing into the accumulator, taking it as a parameter itself, and we're going to simplify that the logic controlling the counter is included in the cpu for us... Yes we're going to cheat.


Our machine code program just then was:

// Load 0 from 20
1
100
// Load 1 from 21
2
101
// Multiply
14
// Store accumulator to 0
22
// Store 0 to 22
5
22
// Print from 22
6
22

As you saw, we got our answer 15... You try some other multiplications... And of course see what happens when we over flow... think about adding op codes to report to your program whether it has an overflow, how would we check?  Would we load overflow into a register and then compare it?  Should we Halt the whole processor?