Wednesday, 23 May 2012

Write your own Virtual CPU (in C++ Code)


In a recent edition of Linux Format there was, what I call, a miss leading front cover bullet point - "Build your own Virtual CPU" - This was linked inside to an article describing using logic gates and other electronic components to build a simple adding unit.  The "virtual" side being that they were doing this in an application, not with physical wires.

Anyone whom has done Computer Technology style courses, or whom is an electronics engineer, will be familiar with such software.  And good on Linux Format for exemplifying the use of such software.

However, it wasn't quite what I expected... For some reason my brain was informing me this article would be about writing code to create a virtual CPU.  No mean feat.  Indeed, this month there is a further article in LXF about emulating other hardware on Linux - the exemplify using Amiga, ST and other emulators.

But, what about those crazed fools among us whom want to think about emulating a CPU in software?  Truely virtualizing what the processor is doing?  Well, we'd need some code.

Emulation of another CPU in code on your current computer is not unheard of, it is infact one of the first things large scale programmable mainframe computers were asked to do.  I was even taught during my degree the key computing tennant that "Any computer can be programmed to emulate any other computer".  This is as true today as it was all those years ago.  It makes no difference about the endianess of your CPU or the specifications of your target platform you want to emulate, so long as you can quantify what the end result needs to do, you can emulate it in software.

That's not to mean its plain sailing, indeed as LXF point out, there are all manner of undocumented effects and slightly off specification behaviours out there which an emulator writer has to get to grips with when they come to write a software version of another CPU or chip.

But first we need to know what a CPU is, how it works, and maybe implement a simple one for ourselves.

A CPU, or central processing unit, is the core element of the modern computer.  Coupled with a set of memory and given some instructions is will carry out operations (such as mathematics, comparison and masking) on numbers the effect of which is to give results.  These results could be the text you're reading right now represented in screen memory, or on the printed page.  Or they could be your taxes in a spreadsheet.

Between the CPU doing this very low level number crunching are many many layers of other software (all running through the CPU) and each layer does a different job.  For example, on the Ubuntu machine I'm typing this very article on there is the Linux Kernel running on top of the machine BIOS, then there is X running to show me the screen, with gnome within that... Together this pile of layers all make up the modern computing environment we think about.

But when we strip back the layers we always find the CPU.

Back when I learned to program there was only one CPU in a machine, now-a-days multiple core machines are common, this means there can be more than one set of instructions (more than one program) flowing through the CPU being processed at any one time.  This helps the multiple layers (or threads) of the system operate more smoothly.

Of course you can still have just one CPU, and each layer has to share time on the core, just as on a very busy system each layer would share time on a multiple core machine.

Emulating such a complex system might be a little too much for this introduction, so we're going to keep it simple.  We'll actually implement a simple CPU (which  has some short cuts) that can load some numbers, add or subtract them from one another and store the result somewhere.  We'll make this CPU work in a linear sequence going from the first instruction through to the last given and we'll make it report any errors in the instructions.

I'm going to write this in C++, because I want to encapsulate the CPU emulator code as a single class or object, which we can use and reuse in other programs.  In essence writing an emulator for a very simple language of instructions which we'll define as we go.

(Please forgive some of the poor C++ syntax and style I will be using, we're going for clarity over function - and we are cutting some corners to keep things simple).

 We must just define what a CPU is inside, a CPU is a piece of electronics which has a set of registers inside.  Just like on a till at the supermarket checkout, the registers hold values of numbers.  There is also a register to store temporary values, the current position in the program we're running and some flags to tell us the status of the machine.

So, first things first we need to define the internals of our basic chip... I want two registers, which we can add together or take away from one another, and then the other supporting items.  So this gives me the code:

byte m_Register0;
byte m_Register1;

bool m_Status;
bool m_Overflow;
bool m_Underflow;

byte m_ProgramCounter;
byte m_InstructionRegister;

int m_Temp;

I've decided that the CPU we're going to write is 8 bits in width, that is the main registers for arithmatic are 8 bits in size.  8 bits being a byte, I have taken the liberty of defining Register0 and Register1.

The status flags then follow, to tell us whether an operation has failed, or an add command has overflowed 8 bits, or whether we've underflowed - our cpu here can only operate on positive numbers in the range 0 to 255, so subtracting a larger value will result in a negative which we can't handle - instead of crashing we'll just set the underflow flag!

The same with addition, adding two numbers which come to more than 255 will overflow, so instead of crashing we set the overflow flag.

The temporary register is a hack, this is a simple register, which we are going to make artificially bigger than the other registers, this is so that we can operate on numbers bigger than 255, or smaller than 0, and detect the overflow and underflow situations without overwhelming ourselves with detail too early.

Given all this, we can start to define some functionality for our processor.

● We may need to reset the CPU at times, this will clear down all the values in the registers and reset all the flags to a valid state, this code would look like this:


void ResetCPU ()
{
m_Temp = 0;
m_Register0 = 0;
m_Register1 = 0;
m_Status = true;
m_Overflow = false;
m_Underflow = false;
m_ProgramCounter = 0;
}

● Now we need to tell the virtual CPU what to do when fed a program, to do this we need to define an "Instruction Set", this is the actions to be carried out when certain values are encountered by the virtual CPU registers.

Our instruction set will contain six instructions, to load a value into either register, store the value from either register, add the two values currently in the registers or subtract the two values currently in the two registers.

enum InstructionSet
{
LOAD0 = 0,
LOAD1,
ADD,
SUBTRACT,
STORE0,
STORE1
};

Defining this as an enumeration of values saves us worrying about duplicating a value, it would never do to have "ADD" represented by the bit code 3 and "SUBTRACT" also accidentally represented as 3...

● Finally we need to actually write the processing code, the engine as it were, inside the virtual chip.  Ours is going to be very simple, we're going to feed it a vector (just a list) of bytes, each byte being an instruction, sometimes followed by a parameter (e.g. LOAD0 is the first byte then [7] the value to load is the second byte) sometimes followed by a space (a one byte gap) into which we'll store a result.

So our processing engine code will look something like this:

while ( not at the end of the list )
{
Get Next Instruction
Act On Instruction
}

This is very rudimentary, but effective.  And we use the program counter to move us through the list of instructions.  Incrementing it each time we take an instruction (or parameter) off of the program we're processing through.

while ( m_ProgramCounter < p_Program.size() )
{
m_InstructionRegister = p_Program[m_ProgramCounter];
m_ProgramCounter++;
switch (m_InstructionRegister)
{
case LOAD0:
DoLoad0 (p_Program);
break;
case LOAD1:
DoLoad1 (p_Program);
break;
case ADD:
DoAdd ();
break;
case SUBTRACT:
DoSub();
break;
case STORE0:
DoStore0(p_Program);
break;
case STORE1:
DoStore1(p_Program);
break;
default:
Fault();
return;
}
}

This is the actual code in the example, we take an instruction, move the program counter then act on the instruction we have loaded.  If we don't understand the instruction we fault the processor.

● Faulting is where the CPU has encountered an error, you'll have seen these, you may know them in Windows as "The Blue Screen of Death".  Most all computers have such fault states, and the operating systems will explain that something has gone wrong to you (or in rare cases just shut the machine off).

However, our CPU is code, so we can be more helpful and get the CPU to tell us where the error is:

void Fault ()
{
m_Status = false;
cout << "Instruction Fault at: " << m_ProgramCounter << endl;
DumpRegisters();
}

This tells us the program we fed in has an error at a certain location.

(Note: Here we meet the first quirk of our virtual CPU - we take an instruction and incremented the program counter - then processed the instruction, this means the fault function above is slightly miss-leading, the error actually happened on the previous program counter position.)

 Dumping the registers is also a helpful feature of CPU's.  I'm sure if you've ever crashed your machine you have seen such register dumps, they usually show a lot of values on the screen and then the machine reboots.

Our dump function will avoid such drastic action, it will simply bring the raw registers to the screen:

void DumpRegisters ()
{
   cout << "CPU Registers:" << endl
<< "Register0 [" << m_Register0 << "]" << endl
<< "Register1 [" << m_Register1 << "]" << endl
<< "Status [" << m_Status << "]" << endl
<< "Overflow [" << m_Overflow << "]" << endl
<< "Underflow [" << m_Underflow << "]" << endl
<< "Program Counter [" << m_ProgramCounter << "]" << endl
<< "Instruction Reg [" << m_InstructionRegister << "]" << endl
<< "Temp [" << m_Temp << "]" << endl;
}

● Thats the mechanics of the processing engine dealt with.  What we need do now is actually perform some actions when we interpret the instructions given.  In the example code this is the set of "Do" functions.  We'll define a couple here just so you get the gist:

void DoLoad0 (const vector<byte>& p_Program)
{
m_Register0 = p_Program[m_ProgramCounter];
m_ProgramCounter++;
}

The Load0 instruction means "load the next byte in the program, into register 0".  So we do just that, assign the program listing position to the register (remember we already incremented the program counter).  And then we increment the program counter to move beyond the byte we've loaded.

DoLoad1 is the same, except it targets register1.  Assume both have been loaded with a value, we can now define how we might add them together:

void DoAdd ()
{
m_Temp = m_Register0 + m_Register1;
if ( m_Temp > MAX )
{
m_Overflow = true;
m_Temp = MAX;
}
m_Register0 = m_Temp;
}

Notice we're performing the mathematical operation into the temporary register.  And remember that the temporary register is larger (more precise) than register0 and register1.  The result is held temporarily and tested against MAX (which is 255) the highest value our CPU registers can hold.

If the result is more than maximum then we set the overflow flag and reduce the temporary value to the possible maximum.

Once done we set the temporary value into register0.  This means all the results of Add commands go into register0 and are of a valid bounds, or are MAX and set "Overflow".

The same happens with subtract, we process into the temporary register.  If results in below 0 (the minimum value a register can hold) then we set underflow and set the register to hold zero.

The complete CPU code

#ifndef CPU_HEADER
#define CPU_HEADER

#include <iostream>
#include <vector>

namespace Emulator
{
using namespace std;

typedef unsigned char byte;

class CPU
{
public:

static const byte MAX = 255;

enum InstructionSet
{
LOAD0 = 0,
LOAD1,
ADD,
SUBTRACT,
STORE0,
STORE1
};

private:

byte m_Register0;
byte m_Register1;

bool m_Status;
bool m_Overflow;
bool m_Underflow;

byte m_ProgramCounter;
byte m_InstructionRegister;

int m_Temp;

CPU (const CPU&) {}

void ResetCPU ()
{
m_Temp = 0;
m_Register0 = 0;
m_Register1 = 0;
m_Status = true;
m_Overflow = false;
m_Underflow = false;
m_ProgramCounter = 0;
}

void Fault ()
{
m_Status = false;
cout << "Instruction Fault at: " << m_ProgramCounter << endl;
DumpRegisters();
}

void DumpRegisters ()
{
cout << "CPU Registers:" << endl
<< "Register0 [" << m_Register0 << "]" << endl
<< "Register1 [" << m_Register1 << "]" << endl
<< "Status [" << m_Status << "]" << endl
<< "Overflow [" << m_Overflow << "]" << endl
<< "Underflow [" << m_Underflow << "]" << endl
<< "Program Counter [" << m_ProgramCounter << "]" << endl
<< "Instruction Register [" << m_InstructionRegister << "]" << endl
<< "Temp [" << m_Temp << "]" << endl;
}

void DoLoad0 (const vector<byte>& p_Program)
{
m_Register0 = p_Program[m_ProgramCounter];
m_ProgramCounter++;
}

void DoLoad1 (const vector<byte>& p_Program)
{
m_Register1 = p_Program[m_ProgramCounter];
m_ProgramCounter++;
}

void DoAdd ()
{
m_Temp = m_Register0 + m_Register1;
if ( m_Temp > MAX )
{
m_Overflow = true;
m_Temp = MAX;
}
m_Register0 = m_Temp;
}

void DoSub ()
{
m_Temp = m_Register0 - m_Register1;
if ( m_Temp < 0 )
{
m_Underflow = true;
m_Temp = 0;
}
m_Register0 = m_Temp;
}

void DoStore0 (vector<byte>& p_Program)
{
p_Program[m_ProgramCounter] = m_Register0;
m_ProgramCounter++;
}

void DoStore1 (vector<byte>& p_Program)
{
p_Program[m_ProgramCounter] = m_Register1;
m_ProgramCounter++;
}

public:

CPU ()
{
ResetCPU();
}

~CPU ()
{
}

void Execute (vector<byte>& p_Program, const bool& p_HaltOnOverflow = true, const bool& p_HaltOnUnderflow = true)
{
ResetCPU();
if ( p_Program.size() > MAX )
{
cout << "Error: Unable to process program, more than 255 instructions" << endl
<< "this is more than the Program Counter can handle" << endl;
}
else
{
while ( m_ProgramCounter < p_Program.size() )
{
m_InstructionRegister = p_Program[m_ProgramCounter];
m_ProgramCounter++;
switch (m_InstructionRegister)
{
case LOAD0:
DoLoad0 (p_Program);
break;
case LOAD1:
DoLoad1 (p_Program);
break;
case ADD:
DoAdd ();
break;
case SUBTRACT:
DoSub();
break;
case STORE0:
DoStore0(p_Program);
break;
case STORE1:
DoStore1(p_Program);
break;
default:
Fault();
return;
}

if ( m_Overflow && p_HaltOnOverflow )
{
cout << "Overflow - Halt" << endl;
return;
}

if ( m_Underflow && p_HaltOnUnderflow )
{
cout << "Underflow - Halt" << endl;
return;
}
}
}
}

};

}

#endif

Compiling the Code

You can test compile your CPU on the Linux command line, simply by saving the above code as "cpu.hpp" somewhere - its an HPP because we're writing C++ but not as separate header and implementation files - we're using a shortcut to code them together for ease - Don't do this in real code!

The instruction to compile the code is;

g++ -std=c++0x ./cpu.hpp -o cpu.o

(Note: You can install g++ with the command "sudo apt-get install g++" on Ubuntu derivatives, or find it in your distributions package manager).

(Note: At this point you can try to just use "gcc" in place of "g++").

This will compile the code, you can fix any transcription errors you have.  Once it compiles without complaint, well done, you've written your own virtual CPU... But what can we actually do with it?

Well, we need some other code to hold the a copy of our CPU for us and to feed it instructions... I shan't go into any detail here, I'll just write out the code and you can review it:

#include <iostream>
#include <vector>
#include "cpu.hpp"

using namespace std;

using namespace Emulator;

int main ()
{
CPU* cpu = new CPU ();

vector<byte> inst;
inst.push_back(CPU::LOAD0);
inst.push_back(0);
inst.push_back(CPU::LOAD1);
inst.push_back(7);
inst.push_back(CPU::ADD);
inst.push_back(CPU::STORE0);
inst.push_back(255);

cout << "CPU & Instructions Ready" << endl;
cpu->Execute(inst);

cout << "Execution Complete" << endl;

vector<byte>::const_iterator itr = inst.begin();
for ( ; itr != inst.end(); itr++)
{
cout << "[" << static_cast<int>((*itr)) << "]" << endl;
}
cout << "Complete" << endl;

delete cpu;

return 0;
}

Again, you can compile this on the command line, save the code as "RunCPU.cpp" and perform this command line:

g++ -std=c++0x -I/path/to/cpu/header ./RunCPU.cpp -o CPU.o

Because the RunCPU code uses the CPU hpp file we defined we need to include the path to it for the compiler, the -I parameter does this.  In my trial of this code I have placed both into my home folder, therefore my commandline is exactly this:

g++ -std=c++0x -I~/ ~/RunCPU.cpp -o ~/cpu.o

From the command line I can now run my CPU program with the command:

~/cpu.o

Program Comprehension

Lets look at what our input instructions were, and what our output result is:

vector<byte> inst;
inst.push_back(CPU::LOAD0);
inst.push_back(0);
inst.push_back(CPU::LOAD1);
inst.push_back(7);
inst.push_back(CPU::ADD);
inst.push_back(CPU::STORE0);
inst.push_back(255);

Our first instruction (byte 0) was to "load the next byte into register 0", the next byte being a zero.

Our second instruction (byte 2) was to "load the next byte into register 1", the next byte being a seven.

Our third instruction (byte 4) was to "add registers together" the result of this will then be in register 0.

Our fourth instruction (byte 5) was to "store register 0 into the next byte of the program storage", and the next byte was a value of 255.

... What was our output? ...

CPU & Instructions Ready
Execution Complete
[0]
[0]
[1]
[7]
[2]
[4]
[7]
Complete

Is what I received, so our instructions are all laid out, the important part being the last byte, this was set to 255, by executing the program we have carried out the sum (0 + 7) and stored it back into the program at that position, the result is therefore 7.

Our CPU works!!!!

Let us change the instructions a little...

vector<byte> inst;
inst.push_back(CPU::LOAD0);
inst.push_back(1);
inst.push_back(CPU::LOAD1);
inst.push_back(7);
inst.push_back(CPU::ADD);
inst.push_back(CPU::STORE0);
inst.push_back(255);

Now we're going to add 7 to 1, we should see an output of 8...

CPU & Instructions Ready
Execution Complete
[0]
[1]
[1]
[7]
[2]
[4]
[8]
Complete

And we do indeed get 8 as our result.

Assembly Language
This kind of instruction list we're passing the CPU is called "Assembly" it is the lowest level of programming we physically perform on the CPU.  But its not perfect, in our little example for instance, there are no checks or bounds on the values being placed into the instructions...

If we tried this instruction:

inst.push_back(128);

We'd fault our CPU, as it has no idea what that instruction means...

CPU & Instructions Ready
Instruction Fault at: 1
CPU Registers:
Register0 []
Register1 []
Status [0]
Overflow [0]
Underflow [0]
Program Counter [1]
Instruction Register [128]
Temp [0]
Execution Complete
[128]
Complete

Or, if we tried to pass a parameter which was an invalid number (in practice our C++ code protects us from this, as we can only add bytes to the instruction vector - but if it were pure binary we may get things wrong).

In such instances we need some software to help us turn our code for our program into valid CPU instructions.  The instructions being Assembly for the CPU this piece of helper software is called the "Assembler".

We could therefore write an assembly which took the text:

LOAD0 1
LOAD1 7
ADD
STORE0

You'll all agree these four lines of text are easier to understand than the seven we wrote in C++.  The software would read through the text given and produced the lexical equivalent of

inst.push_back(CPU::LOAD0);
inst.push_back(1);
inst.push_back(CPU::LOAD1);
inst.push_back(7);
inst.push_back(CPU::ADD);
inst.push_back(CPU::STORE0);
inst.push_back(255);

A lot of time is usually spent writing such assemblers, and then writing other programs (called compilers) which take a higher level language (like Pascal, C or C++) and generate assembler - or the instructions - so we can write bigger and more complex programs more quickly and more easily.  But always pass the CPU a list of instructions it knows about.

LXF Assembly Tutorial Feedback
This leads nicely into the LXF Assembly tutorials which have featured over the last quarter - features that some readers wrote into LXF towers to complain about, to complain and bemoan that such tutorials are not useful.

They are of course extremely useful, they instruct people how their computer works, they help them know what their higher level language compiles down to, they help them understand what their higher level C or C++ code "x = x + y" turns into at the lowest level.  It makes a better programmer of you to know just a little about what goes on below the layers of graphics and touch screens even today, just as it was part of my degree all those years ago.

12 comments:

  1. I found the address for this in Linux Format, and I'm impressed. But, as it happens, I have a good one to brag about as well...

    A little over 15 years ago, while failing an MSc in Computation at Oxford, I made a fairly complete Z80 processor in a language called Handel, which compiled down to a layout for an FPGA, so that the FPGA would in effect become a Z80 until it was re-programmed. The actual hardware was in constant use by all the pushy profs, so instead of running an FPGA, I had to emulate it on a Sun workstation. To my deep satisfaction, it ran slightly faster than a real Z80, and if I'd been able to get a go with the FPGA cards, it would have been a hell of a lot faster.

    Kids of today, they know nothing...

    ReplyDelete
  2. could I make this run for my pc's functions somehow? i can't afford upgrading my hardware so I've been looking for Virtual replacements ....

    ReplyDelete
    Replies
    1. A processor in emulation cannot run faster than the emulator. You can run programs and OSs of a different architecture(ARM on x86,x86 on ARM,MIPS on x86 on ARM...),but the CPU speed is always slower.

      Delete
  3. How do put this in the system so the CPU’s autocratically load on boot-up?

    ReplyDelete
  4. Have you written a tutorial on how to load from an external file in such a way as to make it actually read ASM?

    ReplyDelete
    Replies
    1. "Actually read ASM"... Which dialect of Assembler?... You see just as there are different dialects of C and Pascal there are very many different dialects of Assembler, quite possibly hundreds... So one could ask, do you want to load ARM assembly instructions, or IBM style, or x86... etc etc etc...

      Delete
  5. I need for one job to implement an 8-bit, non-plumbed processor with a reduced ISA, to act
    in "basic" process control. Oqui should have sufficient tertiary
    to allow the input of external data, perform tests (comparisons) on the data
    externalities, to allow the output of information, data movements between
    memory ↔ registers and between register ↔ register.
    Note: You should create a test code for the built-in processor test.

    Where do I find this code?

    ReplyDelete
    Replies
    1. I suggest you go look on Freelancer and post a job for someone to fulfill your desires!

      Delete