The Megalomaniac Bore: Write your own Virtual CPU in C++ (4001 CPU)

I've previously covered writing my own virtual CPU, but I never really fleshed it out and it was never a real thing... For a while, well over a year, I've thought about returning to that same topic... Today, this is the first post in that return.

The 4001 Processor... I'm going to borrow here from a lecture I watched by the excellently entertaining Dr Richard Buckland of the University of New South Wales, check him out on YouTube if you want, in one of his lectures he discusses a simple processor within his lecture, this first processor he creates on the board is the same processor, or very similar, I'm going to work us through here in this post... Richard called it the 4001...

If you want to know how to code this up and built it on Linux, then here's a quick intro to start the series of videos.

This first processor is very simplistic, it can do just one logical thing... Addition, it can add two numbers together, multiplication it can do, because multiplying is just repeatedly adding... It can subtract, because subtraction is just adding a negative number... And you can divide by repeatedly subtracting... I know this because I once got told off for designing code with my 486SX2 processor - which could only add and subtract in hardware - in mind, and so it was very inefficient on later Pentium chips... It really got me into trouble, but that's a story for another day.

So, we can add, what do we require from our functional chip in order to add?

Well, we need to have two registers in the chip, the register being where the two sides of our addition are placed... What else... Well, we need a way to load a value into these to registers.

So, in this most basic processor we're not looking to the future, we will add two separate instructions to load a value into each register. Later we will discuss changing this.

When we've got a result back into one of our registers we need to be able to place our result somewhere else, so we need to save from one or both registers? Well, for the sake of balancing things I'll say we'll save from both registers.

I also want to know the values we've just saved, so I need a way to print from our storage area... We'll therefore need an instruction to print out the value of a given location...

And I've mentioned the "storage area", this is the RAM the random access memory, which is the pool of places we can pull values into our registers from and push values back to, so we'll say this area is needed... but is it part of our processor?... No, but the processor needs to know where it is, it needs to know a base address to look for them, and so when we create our code we'll need to create an area of memory and tell the processor code where that memory is.

Finally, I want the processor to be able to beep... armed with this information we can draw our processor internal structure...

Of course two registers sat in a box don't constitute a processor, so we need to add some other parts... We need another register which tells us how far through our program we have gotten, this is the "Program Counter", and not only is it a register but it has a known behaviour, "When we power on the chip is it zero"... So it points at location zero in our memory. And it has an active behaviour, the program counter reads the current position (or instruction) it can be incremented by one.

Armed with a program counter our processor can now start running through the instructions in a program...

And this is fine, technically we could run a program adding numbers through this now... But what about the numbers, the things we're storing, well each box in the memory grid we've already shown is a byte, that is 8 binary bits... 0 0 0 0 0 0 0 0... Each of these bits is given a value, known as its weight... Increasing from the right to the left. The right most is known as the least significant bit, the right most the most significant bit.

Here we can see a single byte, the top green numbers are the position from left to right of the bits in the byte, and the red is the value we assign to each bit... Each bit then acts like a switch, if we flip the right hand most bit...

00000001

Then we look at the values, we have no values until the right hand position, its value is 1, so our number is now 1...

If we flip the left hand most switch as well...

10000001

We have the value 128 and the value 1 turned on... 128 + 1 = 129... So our binary byte now holds the value 129...

If we flip all the switches on...

11111111

We have 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1... which equals 255, so our binary byte holds 255...

And hopefully you've just spotted the problem our processor faces... If we exceed the maximum value a byte can hold, so we add two bytes which are 255 and 1... What happens? A byte can't hold this value...

Well what happens in most hardware systems is that the binary rolls over, like an odometer in a car does as it counts up the miles, it will eventually get to the maximum and roll over to zero again... our byte does the same thing, from 11111111 all our bits get 1 added, and so they switch over to 00000000 and our mathematical add operation was not successful, because 255 + 1 does not equal the value we have of zero.

So how do we know this happened in our processor? Well we need some indication, a flag, to be waved by our processor which says "I just overflowed", what's what this action is called, its called overflowing, we exceeded the number of bits we could hold and over flowed.

Is there anything else we need? Well, we already said subtraction is just adding a negative number, so how do we represent a negative number?

Well, having just told you about binary we need to change it... Sorry...

We've changed the value assigned to the left most, or highest, bit.. It now represents minus 128, what does this do to our binary?

00000001

This is nothing nothing nothing... until the right most bit, which has a value of 1... So this still represents 1....

How about...

01010101

Well lets follow... 64 + 16 + 4 + 1 = 85

How about...

10000001

Well this is -128 + 1 = -127, the left most bit, the highest bit, being set on totally negates the whole value... Lets look at a higher number...

11010101

This is the 85 pattern, but with the left most bit... so its -128 + 64 + 16 + 4 + 1 = -43

From 85 to minus 43, just through one bit switching, that's a powerful change... So there must be a draw back?

Of course there is, when using a negative number in this fashion is uses up the highest value bit we have... So now we can't hold as high a value in the positive, because the numbers are used up... This limits our processor, so how about we invent a new instruction which switches into and out of, or toggles, between treating the registers as signed and unsigned? So an instruction, to change a mode in the processor, how do we hold the status of this mode? Yes we need another flag, and we need to say this flag is initialised as zero, or off, so by default our processor will start with unsigned number processing, or positive numbers only.

So when we're working with our processor later we need to test this flag out by giving it a high unsigned value and then toggling the flag and reading back the value, we should suddenly see that high signed value become a very negative number.

That is a lot of text, and I'm sure we want to dash off to create some code to represent this simple processor... But is there anything else we've not thought about?

Well, we considered overflow, what about the opposite? What about if we subtract a number from a value which results in a number less than zero? Well, we have our signing status flag so we can say our processor automatically switches into using signs, so activates using negatives, and it can write out a possible negative number... But this isn't enough... what if the number being subtracted was so large that it will be a lower negative result than the lowest negative number we can hold?

The lowest negative number we can hold is just the negative bit set on and all the other bits set off...

10000000

So this is -128 + 0 + 0 + 0 + 0 + 0 + 0 + 0 = -128... How do we store -129?

Well, we don't, what we get is an underflow. Like an overflow, an underflow means the value is smaller than we can store, so we can't store it, we can't trust the result we see in the register... So what we need do is set another flag explaining to the world outside the processor that it could not handle the subtraction... You guessed it, we add an underflow flag...

Is this enough to handle our very simple processor? Yes, I think we're there, at least for now.

There is an awful lot of computer theory introduced here, and I can only apologise at how poorly I'm introducing it to you, trust me though this is better than I was introduced to most of these ideas way back in ye olden days.

Right, Code time...

The Memory Class

Since we're writing C++ we'll put our memory into a class, what do we need?... Well we need an area of memory (bytes) we can access... As need to access them to read the values, and write new values into the bytes... it maybe nice to also clear the memory... And know how big the memory is...

Lets dive in...

The code will be at the bottom, complete...

And the g++ compile command used is:

g++ -std=c++11 Memory.cpp main.cpp -o cpu.o

What this tells us is we're calling the G++ compiler (the GNU compiler for C++) and we want it to conform to the latest C++ language standard, and the first file we're passing is the memory class, and then we're passing the second file; the main file; which uses the memory, we have to pass them in this order because we want to compile the Memory and then use it. We can't compile the main which uses the Memory class without knowing what that class is can we?... No. And then the final item is to output the binary as "cpu.o".

The CPU Class

Now we want to define our CPU class...

The new g++ compile command used is:

g++ -std=c++11 Memory.cpp CPU.cpp main.cpp -o cpu.o

As before this builds out system, but now the CPU has to use the memory and is used by the main, so its code is built after the memory but before the main program.

Now we have our basic CPU, what we'd want to do is call "Run" so the CPU we've written starts to run through its program... But we've not defined the behaviour we want... What will our virtual CPU do?

Well, we said it would run sequentially through a series of instructions... So we're is it going to get these instructions from?

There are two options here, we could put them into some storage, some memory, on the CPU itself... This would be called a cache, in truth on real processors this is called the L1 and L2 cache inside the chip itself. But that does need somewhere else to have pulled the instructions from in the first place... So it itself read from general memory before the instructions in the cache could be executed.

The other option therefore is just to put the instructions in the main memory, well we have a memory class. And putting the instructions into it lets us leave our CPU definition at Revision 0.5 and not rework our code... So, what we can do is say add the instructions in order from memory address zero upwards.

Now we know where the instructions come from we can say that the Program Counter is correct we initialised it in our code at position zero, and we reset it to zero! Historically CPU's could pick anywhere to start from, some used to start around the 1024 mark, so they would skip over the first memory chip (which were usually 1024 bytes in size) because the first memory chip was the first powered up and most often was the first memory chip to go bad! That's an off shoot, but it is something real engineers had to think about, and many of these little quirks are still hanging over us today.

Anyway, we've said our instructions are in memory and start at position zero in memory, what are these instructions? Well, we're going to have actual instructions for the machine these are called "Operation Codes" or "op codes" and we're going to have data... You've already had a hint about this from the Memory class and its Write function, which had an address and a value.

Op Codes

Operation Codes are the heating heart of the processor, so we can start to define ours...

When the CPU reads an instruction of value zero from the Program Counter position.. It does nothing... zero... Null... Nothing... Nada... It does not even move to the next program counter location, it just stops... Zero is what we call the HALT instruction, we can halt our CPU.

What does this do to our Code?

Fetch & Decode

The run function gets a loop, moving through the instructions in memory, until there's a zero... That's fine, but its not really what a CPU does, a real processor actually does two things, firstly it fetches the instruction "Fetch" is the keyword here... That's because as we've already mentioned it might be fetching from a local cache of instructions, or having to move a chunk of instructions from the main memory into the local cache, or just executing as we are straight from the memory... "Fetch" can handle all this for us later on, so we can put that function in now and only make it fetch instructions from the main memory, but later we can expand "Fetch" without affecting our code.

But Fetch alone is not enough, once we have an instruction we need to decode its meaning... so "Decode" comes into play, this function puts some meaning in to the instruction for us... Now you might be watching this blog on your PC at home, if you are you're most likely running an Intel or an AMD brand processor, your processor executes in a set of op codes for those kinds of CPU... Your Apple or Android phone most likely uses a processor from a company using the ARM type chip, the ARM instructions are different to those on the Intel/AMD platform. But the instructions are still bytes (numbers) underneath, but they mean different things because when the different brand processors come to "Decode" the number, they can mean different things...

In our CPU we've got one number instruction... Zero, which means HALT. So we can write our Fetch and our Decode functions and make HALT do something.

We've invented our two functions, and we've changed the Run loop, it itself just keeps going and going... This is because we want to HALT the processor, but what do we mean by HALT?

Halting the Processor

So when we halt what we want is to stop exactly as it is, the running loop should stop, but the memory outside the CPU should stay unchanged, and the registers and status flags should not change inside the CPU. In fact the ONLY thing we want to change is a new flag... A new flag which says "I'm halted" to the outside world...

This would let the system monitor the processor, think about your multi-core machines today, in laptops and mobiles you can have lots and lots of processor cores, but when not being used they can be put to sleep, we want our halt to work a little like that, except we're not halting to save power, we're halting to allow us to inspect the memory and status of the processor...

What does this do to our code?

Well, first of all you'll see we fix our access to the memory, to "Read" - my mistake - and then what we do is add the new Halt flag, default it to false when we construct the CPU and then we change the run function to fetch the next instruction into the decode function...

There are a couple of problems here however, firstly we're fetching directly into the decode function...second, the fetch function is not just fetching its incrementing the program counter for us, perhaps incrementing the program counter should be done with some logic so that we could alter how we move to the next op code later... What does this mean? Because we're loading the fetch directly into the decode we don't have any way to inspect the current op code, we need some intermediary...

How about we fetch into one of the two registers in our CPU class?

So now we could halt and know the op code?... Right?... RIGHT?.... Nope.

Improving Fetch

We need to think a head to what other codes we'll have, well one of them is "Add"... Add takes two inputs, two numbers, to add them together... We only have two registers so as we start to execute an Add we'll put one of the two numbers we're adding into register 0 and so overwrite the op code we just loaded... If the Add overflowed and the CPU halted, when we looked back the status would tell us the last op code is not add, it will be the last value put into register 0 by the add... This is called register stomping.

In our very simple system we really don't want to add more and more registers to the CPU... And back in the olden days adding registers to CPUs was very costly, adding a whole register just to hold the last op code could make the chip cost prohibitively more to produce...

So, in our system here how are we going to get around this problem? We don't want to add another register... We need some other storage... We only have one other place data can be, the Memory... Now, I've already said some processors used to skip over the first kilobyte of memory to start execution, well it seems if we want to use a piece of memory as the "last op code" then we need to make our programs start from a different location in memory...

Why make the program move? Why not just put the op code at the end? Well, we don't know how long a program will be, nor do we know how much data the program might want to handle, it might fill all the memory in the Memory class... So if we say we want to reserve the first byte of memory for the last op code, then we know that address zero is reserved, and not for use by the program instructions or data, and we say that user provided program instructions can't access memory address zero... This is a new rule... we can code this into our CPU construction & reset functions, and later when we come to add the other op codes enforce this rule.

Yes, now we could halt our CPU class and query the Memory class and know the op code which caused the problem, this is the first step towards allow us to Debug our CPU code.

More Op Codes

What other Op codes do we need? Well the next is to Load a value into Register 0... "LOAD0", and it has a following byte which is the value... so this op code when in memory has its code value and is followed by the actual value to load... We'll give it the op code 1, so in our memory it we were going to load 67, we would find the byte 1 and then the byte 67. This makes up the instruction "LOAD0 67".

That was easy, how about another, Load a value into Register 1?... "Load1"... we'll give it the op code 2... and again it is followed by a value.

And then the op code for Add... How about 3... And this puts a result somewhere, well we'll put that result into Register 0.

The loading functions are fine, we're moving a byte from memory storage to register storage, they're the same size because we know the architecture of our processor and code... So we don't need any checking...? Well, what if we're at byte 255, the highest possible byte in memory, and we see either load instruction? The whole emulated CPU would crash, because we're reading beyond the range of the memory.

Memory Address Range

The load function incrementing the program counter to pass by the data, or even trying to read data which is already beyond the range, or the fetch instruction trying to increment the program counter beyond the end of memory is bad, it violates the whole idea of the memory space. So where in our CPU code so we handle this? Well, a real processor handles this by limiting the area the processor can address from user operation codes. You've seen this already with the new reserved address to skip the byte we're using as a temporary storage of the op code. This reserved address is the lower bounds of any data, be that an op code location (pointed to by the program counter) or values following an op code. And the upper bound is the max-address...

How might we want to handle this? Do we want to limit the program counter in the Fetch function to be less than the maximum address minus the longest instruction length? So our longest instruction is Load0 and Load1, they are one byte plus one byte of data... the maximum address in Memory is 255... so 255 - 2 is 253, so we limit the Program counter to always be below 253?

The answer is possibly, we're making this up as we go along after all, what cons are there to that approach? Well, we loose 2 possible bytes from the program if the following instructions are a single byte... And we're still not going to run the program correctly, because we need to halt... a program must end and the CPU stop... to the last instruction has to be zero?...

Do we assume that the program counter going out of memory range is automatically a HALT?

You can try other options in your code, I'm however going to assume an out of range is a HALT... But what will be my range?... I'm going to avoid complex "checking" of the length of the operation, known as look-a-head, and I'm going to limit the program counter to never increment beyond the maximum address less the longest instruction length, and I'm going to call this the address ceiling. Lets jump to our code.

There are a couple of fixes in the code - to reset - and we split the Add and Halt functionality into functions of their own, so we may call them from different places. But mainly we're adding the new address ceiling and capping the Fetch to call Halt upon reaching the cap...

Other Instructions

The other instructions we wanted were to beep... Well, we'll just add a single op code 4 for that...

What about saving the result of an addition back to the memory? Well, this is more complex than previous operations, out only operation is add, and the result is placed into register 0, so we want to store the value in register 0 to a memory address, but we can't directly operate on a value in the Memory.. First we must load the target address from the program in memory into Register 1... Then write from the memory the value of register 0 into the address held in register 1... We can not in this model processor store from register 1, because we need at least one other register to store the target address to. We'll give this save operation the op code 5 and it will take a second byte the address.

And finally, we want to print a value from a memory location to screen... This is going to be a two byte instruction, the op code and an address in memory. We'll give this last one the op code 6.

Importantly this last instruction can print any value from memory, an op code or data... Do we also give it access to the reserved address? I would say Yes, because we want to debug the last op code if something goes wrong. However, this function does not directly output a register, why is this? Well, lets think about how we print from a memory location, can the Memory directly write to the screen? In a real system no... First to write to the screen you have to load the memory location into a register, so this op code 7 printing function uses the register, it actually performs a load0 and then prints. This instruction is another instruction which is more complex than previous instructions, because it contains another simpler operation we already do, a load.

Let us bring our code up to this level before we think further...

Inputting a Program

I mentioned in a previous blog entry I had been watching lots of videos about the Altair 8800 computer, well we're going to pay our homage here, because we now need to put some instructions into memory and then run them...

On the Altair one does this by setting the location in memory to point to and then inserting a number into it. We're going to do something similar, just starting from the base address of 1...

What will our program be? How about One plus two and then save the value and print that stored value followed by a beep! That's pretty much going to exercise our whole CPU and tell me whether to this point any of the code we've written actually works... Lets just pseudo code our program though...

Load0 1

Load1 2

Add

Store

Beep

HALT

Where do we save to and hence print from?... We can see the program is quite short, so we could put the data at location 100 and know we're not going to trip over it...

But that puts some data in a strange location way out in memory, how about then we count the number of bytes we use and determine where we can safely put our value?

Load0 (1) plus data (1) = Total Bytes 2

Load0 (2) plus data (1) = Total Bytes 4

Add (3) = Total Bytes 5

Store plus data (?) = Total Bytes 5/6... So can we add our data at 6?...

Lets look at the code:

void CPU::Store()

{

// Load the target address into register 1

m_Register1 = m_TheMemory->Read(m_ProgramCounter);

++m_ProgramCounter; // Skip the memory location data

// Write the register 0 value to this address

m_TheMemory->Write(m_Register1, m_Register0);

// Remember the order of our parameters

// was ADDRESS then VALUE!

}

No, because after we write the program counter is not on the next byte, its actually pointing to that byte we just saved at... What then happens to our program?

The program would end up with the value 3 being stored in byte 6, which is then Fetched as the next op code, this results in register0 and 1 being added together again...

The net result is the program would do two adds, but the second add would do nothing of consequence except slow down our program moving onto the Print function... The program would still work... By sheer blind luck... Change the values of their of the two loads the it will go crazy, try it out later.

What do we call this? Well, this is runtime corruption of our program.

Do we want our program to act in this way? No. Do we want to place the data somewhere randomly in memory? No... What might be a good solution then? Avoiding randomness and avoiding program corruption?

How about we add data to the first byte after the program instead of somewhere random? Yeah, I think that's a good idea, so how big is the whole program?

2 Bytes

1 Byte

2 Bytes

1 Byte

That's a total of...11 bytes, so the first address for data is 12, after the end of our program. This is important later!

Machine Code

Now armed with our op codes, what are the bytes we're going to enter into memory, our first machine code program, for our virtual CPU?

1 1

2 2

5 12

6 12

Adding this to the code then...

So we insert the program, and the first time we run it, we can check the bytes in memory, and see the output... 12...? 1 + 2 = 12... Nope...

What have we done wrong (alright, what have I done wrong?)... yes, I've done the print function wrong, it loads the next byte of instruction into register 1, then uses that as the address to load into register 0 and prints that to the screen.

And when its re-run it works fine and we see 3 shown...

This completes this large and heavy introduction... In the next posts we'll tackle the error or underflow and overflow we've set up flags for. We'll do multiplication, subtraction and division, and we'll also update the main program to include a serial terminal to let us input programs dynamically - like you would on the front of the Altair...

Part 3 will be related to writing an assembler for our virtual CPU, so stick around.

Source Code
The full source codes to this point are published on pages with this blog here:

Memory Header
Memory Implementation
CPU Header
CPU Implementation
Main Program

Part 2 here.

Thanks to David for pointing out the above link was wrong, for years... Why did no-one else notice?!?!?! And does this explain why there have been 10K views to part 1, and 1K views to part 2???!?!... Maybe :)

The Megalomaniac Bore

Wednesday, 16 April 2014

Write your own Virtual CPU in C++ (4001 CPU)

7 comments: