Wednesday, 23 April 2014

Write your own Virtual CPU in C++ (4001 CPU) Pt Two

Part One can be read here.

Now on with the show, I plan to post new editions of this CPU series every Wednesday around 18:30 UK time.

PLEASE NOTE: During this tutorial I change the nature of "Load0" and "Load1" from the prior post, I do this later on as I find problems in our implementation, so pay attention, but following the post in order will give you the best idea of my thought process in debugging the CPU.

When we last left our Virtual CPU work we needed some new features in our main program, so lets add them with this code:


In this the new source code we see a menu we can pass through, we can load the default program, clear the memory, list the memory and run the loaded program.

We added all these code features as C style function headers (function prototypes) at the top, and then wired them into a basic "do-while" loop in the body of "main".

We have a single Memory instance, and when we send the memory down to the CPU we create a new CPU and pass the memory through it, this is like we've hard reset our virtual CPU each time, but really we've just created a new instance, so our CPU constructor is the focus of how our machine starts to now run for us.

So, we have a resettable programming environment?... Well, not quite... I said we'd add the features to load programs into the memory, and we'd then expand on things, so lets look at that code, I'm going to live-record creating the program entering code...


It's not very useful, but it is enough for now...

So, what might we want to do with our CPU class?  Well, adding is fine, and we've already made provision to add a negative number later, but for now how about Multiply?

Multiply is just repeatedly adding right?... So, what might our program look like?

Well, to carry out a loop, we've got to use memory to set up a point to count into, a temporary location, the question is do we put this into our program?  Well we don't yet have an OP Code for Multiply, so we have to write code, how about we do 5 x 3?... And in this section where I put "//" they're going to be considered comments just like in our C++ code.

How about this program... 5 x 3 = 15.. Whatever we do we want 15... All right... lets make 15 come out...

// Load value 5 into register 0
1
5
// Load value 5 into register 1
2
5
// Add (5 + 5)
3
// Add (10 + 5)
3
// Store the value in register 0 to location 100
5
100
// Print memory location 100
6
100
// halt
0

That worked, we got 15 out.. right?... Right?... RIGHT?... No!

Of course it didn't work, we didn't multiply really, we set that all up, what we need is to just had a single add instruction which we go through 3 times, not just blindly add...

So, lets say we start with

// Load value 5 into register 0
1
5
// Load value 3 into register 1
2
3

We want to add register 0 to itself for the number of counts in register 1, this is how we really want multiply to work.

In our code we need to go in a loop... Hmmm, a loop what might a processor be doing to loop through something?  Well the first thing it is probably doing is changing the program counter to go from where it is now to somewhere else, what might we call this?  Well traditionally changing the program counter like this is called a jump.

real processors do this with a series of different logic instructions which go to different places in your program depending on a value... the most common is "JUMP IF EQUAL"... In our CPU we're going to say the Program Counter "jumps" to a given memory address if registers 0 and 1 are the same... What memory address though?

Yes, you guessed it, we reserve a new byte, this is our jump to byte...


So our code hopefully, loads the next program byte into the jump to address location, then reads this same byte back to the program counter... Now we could clobber our program counter here, so be careful!

And next we need the same jump instruction, but with our register check...

Note: The jump to in the video is wrong:

void CPU::JumpTo ()
{
// Read the program counter location
// into the jump to address for use
m_TheMemory->Write(
c_JumpToAddress,
m_TheMemory->Read(m_ProgramCounter));
++m_ProgramCounter;

// Read back the jump to address
// as the program counter
m_ProgramCounter = m_TheMemory->Read(c_JumpToAddress);
}

Lets just run a program with a jump, and then a jump equal...

// Load value 1 into register 0
1
1
// Jump to memory location 13
9
13
// Add
3
// Store to memory Location
5
2
// Print from memory location 2
6
2
// Halt
0
// Total 9... Now pad to location 13...
0
// Load 41 into 1
2
41
// Jump back to memory location 5
9
6

Why do we jump to 13 and not 11?... Because we know our machine has two reserved bytes!

What we end up with is something we might think of as a rudimentary function here, we load a value, then jump into a function which loads another value and jumps back to the add...

Here's the code updated so far, and lets see this program be entered...


You may see in the code the Jump Equal function, we know from this function:

void CPU::JumpEqual()
{
if ( m_Register0 == m_Register1 )
{
JumpTo();
}
}

If the two registers are equal at this point the program will jump, this is logical, and the previous program tested our "JumpTo"...

But what about when the registers are not equal?  What will it do?...

Well, it'll try to execute the next op code it fetches... Which is not an op code, it will be the address for the jump to target... What we want to do is actually, when they're not equal just carry on.

We can make it do this, by skipping the program counter over the target jump address, to pick up the next op code...


NOTE this needs a fix, see second video.


// Load 1 into register 0
1
1
// Load 2 into register 1
2
2
// Jump if equal to 9
10
9
// Otherwise just stop 
0
// At memory location 10
// add
3
// Store to 100
5
100
// Print from 100
6
100
// Halt
0

So we have two halts, we can either halt without printing, or add and print... Now we've stacked this program so we know this time it will fall through... But we will use the editor to change the 4th byte to contain a 1 and therefore be able to then re-run the program without re-entering it all...



We've seen therefore that the value was not equal and the program just halted, and then we've seen that the program had equal values and did jump, hence it added and printed the result.

But does this let us now create a version of multiply?

Lets look at a program we could write:

// So start our variables in our register...
1
5
2
3
// First things first we need a location for our cumulative addition, I'm going to pick a byte I know I won't use... byte 250.

// Store register 0 to 250
5
250

But now I want to store the value from register 1... We don't have a store operation for register one, and it maybe very technically challenging to add such a linkage... What other operate could I add to the CPU to quickly let me operate on the value of one register in the other?...

Well, what about copying register 0 to register 1, or register 1 to register 0?... Well this would overwrite the valuer in the register, but it would let us be more flexible... So copying from register 0 to 1 would leave the same value in both... Useful for checking with a Jump Equal wouldn't you say?


Now we can copy register 1 into 0...

// Copy 1
12
// Store register 0 to 251
5
251

Now we want to reload values into our registers from memory, but there's mess in them, so what could we use?

Well, I've already added them to op codes 7 and 8, they're calls to clear the registers individually... Add two new Op Codes numbers 7 and 8 to zero off the registers for us....


And then another, right now number 13, which will clear both...


Our whole program so far reads:

// Load 5 into reg 0
1
5
// Load 3 into reg 1
2
3
// Store register 0 to 250
5
250
// Copy register 1 into register 0
12
// Store register 0 into 251
5
251
// Zero off a location in memory for our cumulative value
// Clear Register 0 to value zero
7
// Store into 252
5
252

This is just the set up for our loop of course, lets introduce something to help us remember what our program has in memory.  This is a trace table, what a trace table is can vary from person to person, programmer to programmer, however, what I like to use it for is holding the value of variables over time, so each time a value changes I can scrub out the value held and change it.

Some write out the whole table, some people fire up a spreadsheet and use the cells to track the values and add notes.

Some carry a notebook, I am a notebook carrier, and did get a few stares when I first appeared with reems of notes about a system in five thumb depth journals, but I'm the one left with a job and everyone else has been sent on their merry way, so trust me its worth keeping notes and proving your code!

Lets create a trace table for the variables we're storing in the upper part of our memory...


So now we can see our memory, and we have the values now lets think about a loop, we know the number of times to go through the loop, what else do we need?  How about a counter for the number of passes we've completed?

// Clear Register 0 to zero
7
// Store into 253
5
253

Now I think we can start our loop... Lets recap the whole program to this point...

// Load 5 into reg 0
1
5
// Load 3 into reg 1
2
3
// Store register 0 to 250
5
250
// Copy register 1 into register 0
12
// Store register 0 into 251
5
251
// Zero off a location in memory for
// our cumulative value
// Clear Register 0 to value zero
7
// Store into 252
5
252
// Zero a memory location for our
// count of passed through the loop
// Clear Register 0 to zero
7
// Store into 253
5
253

Notice we're saving bytes with our "Clear" operations, rather then two bytes to load register 0 with a zero, we clear the register with a single byte... This is our first optimisation instruction.  We could have done the same thing with a two bytes, but we make the CPU do it for us with one op code.

Let us start our loop code then... We are at by 17 in memory, so 17 is the address of our first loop instruction

// Clear all registers at the start of our loop
13
// Check whether we have done enough loops

// Load 251 into register 0
1
251
// Load 253 into register 1
2
253
// Jump if equal to the end of the loop..

We don't know how long our program is right now, or where the end of the loop is so we'll just put 'END'

10
END
// Otherwise we carry on
// Load cumulative value into register 0
1
252

We load the value we're adding to into the first register becuase the order of "Add" puts the result into register 0 and we Add to and then store the make our multiply...

// Load the value we're adding into register 1
2
250
// Add
3
// Store the result to 252
5
252

Now we've just done a store, so our trace table changed, what do we think we did?  We added 5 to 252...


Now increment our loop count...

// Load register 0 with 253
1
253
// Load register 1 with 1 - the amount we're incrementing by
2
1
// Add them
3
// Store the updated count
5
253

Update our trace table...

So we've added a 5 to our cumulative, we've added one to our count of loop.. is this the end of our loop?... Yes we need it to now jump back to the top no matter what, the jump to address at the top was 17... So our next instruction always takes us back to the clear at 17.

// Jump to top of loop
9
17

Use your trace table to go through the loop yourself and check it works!

And we can now know the address of the value after this loop jump and so set the value of "END" in our program above... Counting the bytes up, we should be at address 40.

So, at address 40 we add our print instruction...

// Now print
6
252
// Halt
0

Lets recap the whole sequence of bytes then....

// Load 5 into reg 0
1
5
// Load 3 into reg 1
2
3
// Store register 0 to 250
5
250
// Copy register 1 into register 0
12
// Store register 0 into 251
5
251
// Zero off a location in memory for
// our cumulative value
// Clear Register 0 to value zero
7
// Store into 252
5
252
// Zero a memory location for out
// count of passes through the loop
// Clear Register 0 to zero
7
// Store into 253
5
253
// Clear all registers at the start of our loop
13
// Check whether we have done enough loops
// Load 251 into register 0
1
251
// Load 253 into register 1
2
253
// Jump if equal to the end of the loop..
10
40
// Otherwise we carry on
// Load cumulative value into register 0
1
252
// Load the value we're adding into register 1
2
250
// Add
3
// Store the result to 252
5
252
// Load register 0 with 253
1
253
// Load register 1 with 1 - the amount we're
// incrementing by
2
1 <- Remember we changed loads... DOH!
// Add them
3
// Store the updated count
5
253
// Jump to top of loop
9
17
// Now print
6
252
// Halt
0

I don't know if this is right at this point, we need to run it and debug the program, so lets watch a video of me doing this live... wish me luck!


You should come back here only after watching the video... Because we found a problem... I'd like to say it was a bug in the program, but its not it's a bug in this programmer... I went a head and listed machine code here referring to a value, not a memory address, in our system if we want a value we need that value to be in memory, and we need to load it... So for 1 + 1 = 2, what we need is

Load memory address with 1
Load Memory address with 1
add
Store to memory address...

We don't want to load the value itself any more, we want to give load a memory address for it to retrieve the value from that location.  We then place our values at the end of our program way out in memory, we have touched on this before, but never changed our Load calls, so now we will add them to our code.

void CPU::Load0()
{
   // Move the memory address to read into the register
   m_Register0 = m_TheMemory->Read(m_ProgramCounter);
   // Move past the memory address in data
   ++m_ProgramCounter;
   // Read from the memory location into the register
   m_Register0 = m_TheMemory->Read(m_Register0);   
}

You need to implement the same for Load1.

This kind of instruction is one we might have to have worked around, we might have had to allocate yet more reserved memory loaded the memory address into the register, saved to the reserved memory, then loaded back into the register from the reserved memory.  You can go a head and do that to keep your implementation pure.  But I am going to say this is a complex instruction set computer (see bottom of the post), and as such I'm going to allow this instruction to reference itself.

The knock on effect of this is that we no-longer load values directly from our code into RAM, so our first two bytes:

// Load 5 into reg 0
1
5

Are not load 5 into reg 0, they're now meaning load from memory address 5 into register 0, this means we need to move the data we load off into memory somewhere outside out program.

Going a head here, we need to put 5 and 3 somewhere for our initial loads, I'm going to put them at 100 and 101, well beyond the end of our program... And then I need a constant which is the number we're going to increment our loop count by each pass... So 1... I'm going to put this at 102.

So we have our program, we have our variables for the loop at 250, 251, 252 and 253, and then we have the loop increment amount at 102 and our parameters at 100 and 101... How does this change our code?...

// Load 5 into reg 0
1
100
// Load 3 into reg 1
2
101
// Store register 0 to 250
5
250
// Copy register 1 into register 0
12
// Store register 0 into 251
5
251
// Zero off a location in memory for
// our cumulative value
// Clear Register 0 to value zero
7
// Store into 252
5
252
// Zero a memory location for out
// count of passes through the loop
// Clear Register 0 to zero
7
// Store into 253
5
253
// Clear all registers at the start of our loop
13
// Check whether we have done enough loops
// Load 251 into register 0
1
251
// Load 253 into register 1
2
253
// Jump if equal to the end of the loop..
10
40
// Otherwise we carry on
// Load cumulative value into register 0
1
252
// Load the value we're adding into register 1
2
250
// Add
3
// Store the result to 252
5
252
// Load register 0 with 253
1
253
// Load register 1 with 1 - the amount we're incrementing by
2
102  <- That's better
// Add them
3
// Store the updated count
5
253
// Jump to top of loop
9
17
// Now print
6
252
// Halt
0

How about that?  Lets go back to the CPU and see...


We're back again, and we've got another bug, this time it is a true program bug where we've not thought about how STORE works, it works by using register 1 to take the target address for the storage.

This use of register 1 clobbers any value within register 1, where as we've assumed our value loaded into it (from address 101) is still in there.  That assumption is wrong, our instructions are not going to work, so we need change our code.

We remove the copy 1 to 0, as we don't need it.  And we move the load address 101 to after the first store of register 0, and change that load to also load into register 0 directly.

1
100
5
250
1
101
5
251
7
5
252
7
5
253
13
1
251
2
253
10
39
1
252
2
250
3
5
252
1
253
2
102
3
5
253
9
16
6
252

Address 100 = 5
Address 101 = 3
Address 102 = 1

Let us try this code one last time...


Hopefully, after any fiddling you've now seen the program run!

RISC or CISC
What have we achieved though?  Well, we've managed to write a whole program using just "Add" to multiply two numbers together, why is this important?

Well, in computing we have two broad genre of processor type the original type is called "CISC" which stands for "Complex Instruction Set Computing", the processor contains many many op codes, if you want your CISC processor to fry bacon you add it as an op code, they have many many many operations and though this makes the software for them very easy to write - if you want to multiply just load your registers and call multiply - they make the processor big and heavy and use lots of power.

Intel and AMD are typically CISC type processors.

The other type, like our CPU, are "RISC" this is "Reduced Instruction Set Computing".  Which means they try to contain as few instructions as possible, if they can multiply by adding, why add the multiply operations into the hardware, just do it in loops like our code.

This makes them very small very light and very power efficient processors, ARM type processors are all RISC, the old BBC Archimedes computers used these chips, traditionally they run slower, because they build up complex operations from many smaller instruction operations.

This is important in things like your mobile phone, where a RISC processor is most certainly present, because their type makes them so much more power efficient.

But as we've seen it can lead to making the software more complex...

LOAD Register 0
LOAD Register 1
Multiply
STORE Register 0

Would be a much much shorter program for our CPU, but it adds a new instruction.  Typically today even RISC processors do have multiply, we're just demonstrating with a working example, but hopefully this has you thinking.

Which type is out CPU class?  Well, with the Load0 and Load1 functions you've seen the complex re-use of the register already, so our CPU is a CISC machine... So we'll add the multiply op code as 14, and add the multiply function...

void CPU::Multiply()
{
m_Register0 = m_Register0 * m_Register1;
}

So with our original sum... 5 * 3... Our program becomes a very boring:

1
20
2
21
14
5
22
6
22
0

Where we have Address 20 set to 5, Address 21 set to 3, and the result is put into address 22.

Exercise
I'd like to say this debugging was easy, but you should see its quite hard, every time we locked up our machine the only way to recover was to kill the program.  Originally this would have meant powering off our whole machine and re-entering it.  Certainly this was the case with hand-coding many machines with their machine code, and it of course spurred people onto bigger and better things.

It allowed them to think about rather than loosing the whole program, saving it before running it, to think about adding methods of loading whole programs in from ticket tapes or reels or magnetic storage which unlike fickle RAM would not loose it's contents on power loss.

Later it lead to people adding to their CPU new features, things like interrupted, all of which we'll cover in coming posts.

But for now, do some research into C++, we're using "std::cin" to read our option from the menu and then we start our CPU running... Can you think of a way, or find a way, to add to the "CPU::Run" function before each "Fetch" a way to see if a key has been pressed?

And then if a key has been pressed to read the key and if it is the "B" for break key (so capital B, shift and B - so its not too easy to stop a program by brushing your keyboard accidentally) and if it is "B" exit the run function and leave the memory as it was?

If you can do this, you improve your debugging a lot, because you can now break out of your running program, inspect it, reset values in memory and start the program again!

(I will add this into a new post on its own and schedule it for a few days time).

Part Three Here.

3 comments:

  1. Another great post.

    I've only scanned through it as I'm eating and I soon have to feed the dog and take him for a walk ( Simple life huh? ).

    Looks like the code is getting bulky. Liking the menu and debug features.

    Think it's possible to include A small 512 byte ROM and code a BIOS to tie it all together? Sorry if you have already done this I haven't had chance to watch the videos :)

    ReplyDelete
    Replies
    1. This post is not going to have a BIOS, nor a ROM, its still detailing OP codes and leverage's multiply as the example of a complex program you can make on a RISC platform, or as an instruction on a CISC platform.

      Its all still working in a basic Machine Code too, but watch out for Friday's post which is scheduled to do some house keeping moving this project forward a step with regards Debugging.

      Delete
    2. New parts coming Friday & Sunday, covering Loading & Saving programs from Memory as well as a fundamental look at Addition with correct Binary Logic.

      Stay Tuned.

      Delete