The Megalomaniac Bore: emulate

Showing posts with label emulate. Show all posts

Tuesday, 31 January 2017

Development : Simple Versus Complex

I have been reading some of Alan Turings works, specifically some of his work for the NPL in the late 40's and early 50's, before he went to Manchester in frustration with the NPL's lack of progress with his ideas.

One of the ideas Turing was pressing at the time was to do as much as he could in software, don't jump to hardware solutions; which at the time fixed the functionality of the machine to a single purpose; instead he wanted a general computing device he could put to any uses he desired.

Today the average programmer, hobbyist or indeed employed software worker, such as myself, don't have the opportunity to define new hardware for a bespoke task, indeed the machines we are driving are far more powerful than Turing ever imagined and we can do as Turing suggested, we can write software to get around an issue.

You need encryption, why pay to plug in a dongle doing this, when the processor you have is equipped to do the task for you, indeed some processors even have encryption instructions and powerful parallelism built into assist in such work!

So today's lesson is, take on board Turing's ideas, do in software as much as you can, even if you want to later move to real hardware emulate your idea now. Here's a very good example:

nVidia's emulation lab is an extreme example, however, it is indicative of the kind of problems we are meeting today as older engineers are coming away from dedicated hardware devices (because their machines were too slow in the 1990's) and we have today's extremely powerful machines right there at our finger tips.

Why does this exist?

Last week I set about looking at a quite old system (circa 2008 it was last attended to) it contains a message passing system, which allows passing of messages (events if you will) between producers in any process to consumers in any other process subscribed to the service. It did this with a windows service driver, which called down into a USB dongle to queue the message.

The consumers then polled the USB device (round robin style) to determine if the next message was for that consumer... You might imagine this was extremely slow.

It was also very complex, the code within the USB dongles was not mutable; being set in silicon; and the consumers could end up locked out of the service queue by a message being present for a consumer which was not able to consume the message.

This clearly needed replacing.... So, I had to look at it, a single afternoon, it was just extremely complex inside, so one had to take a step back, don't look at the contents of the code, look at the API, the functions being used....

The producer code relies on just three functions and the consumer relied on a thread safe queue which was woken up from a spin wait to call the registered consuming function... Not rocket science.

I set about writing new code with a very simple UDP server and clients, taking the message in, converting it to XML and posting it to the loopback IP Address of the master UDP listening port which then sent it out to the listening consumers on their loopback IP and port number.

Voila, in an afternoon, I had replaced five years (ish) of complex, hard to maintain code, with a working prototype of a solution. And it dropped straight into place. The dedicated, slow, old, hardware could be unplugged and the system just carried on as it always had.

The customer for this product is delighted to see it running faster, and the overhead of the hardware costs being removed. All by simply doing in software something which was being offloaded to hardware in yester-year.

Wednesday, 4 June 2014

Virtual CPU - ROM & Interrupts

Interrupts are ways for hardware, and sometimes other pieces of software, to break into the flow of your CPU processing and tell it do something else.

This is my very friendly way of describing interrupts, unfortunately a lot of sources of good information drop people into the idea of these mystical beasts far too quickly. I'm going to try not to drop you all in it.

Lets think about our CPU, and how we know it operates, our CPU has a Run function, which has a loop within. This loop fetches an operation to carry out from the memory pointed to by the program counter and it acts on that instruction.

As we've seen from example programs we can also jump to a new memory address.

And if we took a close look at the Intel published datasheet from the prior post we'd also see that there are many other kinds of jump, jumps when things are equal, jumps when they're not, jumps for the sake of jumping...

Well, an interrupt is a special kind of jump instruction, which when the interrupt is signalled (triggers/fired - whatever term you want to use) gets inserted as the next instruction for the program to carry out.

The Interrupt instruction is special, because it tells the CPU to jump off to some piece of code elsewhere, but also remembers when that piece of code is complete to come back and pick up exactly where the CPU left off.

Like a function call in other programming languages, but at the CPU level. In our programs therefore we now need to think about what state the CPU is in before and after an interupt is carried out (or handled).

Because the code in the interupt function might change the state of the CPU, and sometimes we don't want this.

A good example comes to mind from the very first real (i.e. not BASIC) programming language I learned, Pascal. In Microsoft/Intel PC's running the MS DOS Operating system (16 bit) the BIOS had a set of standard interrupts, the standard interrupt number 11H (hexidecimal) was for the mouse, if this interrupt was triggered the mouse had moved, but we didn't want the state of the processor to have changed within our program, so when the interrupt handler was defined in our code we used to have to say:

"PUSH REGISTER STATE"

Before the interrupt code was run, and then when the interrupt processing was complete we would say:

"POP REGISTER STATE"

Lets look how an interrupt might work in our CPU Run function:

From left to right, we see our Program on our CPU... Then we see the Interrupt get signalled (some how) and the program counter is pointed to the new piece of code "A"... The Run function continues through the interrupt instruction codes and then returns to your program at "B", by setting the Program Counter back to point at your original programs' "next" operation.

The interrupt code could have changed any of the CPU registers or status flags, so it is at point "A" we must be able to Push, or save the current CPU state. Then at Point "B" we must be able to Pop, or load the saved CPU state back into the machine proper.

We say "push" and "pop" as the structure we use to save and restore the CPU state is called a "stack". I'll come back to that later, but you can check out an explanation of stacks here if you're eager to understand what we're talking about.

In our CPU however, we're not going to add a stack just yet, we're just going to add a set of mirror values for each CPU member variable. In early silicon this would have been totally impracticle because of the cost of adding redundant duplicates of everything would have been far too costly... Those early chips might have therefore saved their registers off to memory.

Whatever they did the essence is the same as what we're now going to make our code do... it saves all the values somewhere.

So what might cause an interrupt and what might the code in an interrupt actually do?

Well, the most basic interrupt you will probably use today is the keyboard, as you press each key an interrupt is generated going off to the processor telling it that a key stroke has arrived.

The processor can then pick up and action the key stroke, adding it to data, or changing status however we instruct it to.

So where does the code for our Interrupts come from? At the most basic level in your machine they come from the BIOS. So the BIOS handles the key stroke, some store or queue many key strokes at once, and then they are handled by the processor as and when the Operating System software wants to accept them.

How might we add interrupt code to our CPU? Well, from the last post we mentioned ROM's... We're going to write a ROM class and add to it key handling.

Right we've implemented our ROM class, then added the interrupt, now we can move into an interrupt and return from it, we also have the push and pop instructions added, so op codes 200, 201 and 202 are important to the interrupting process.

When we go into an interrupt the CPU changes state to start reading from the offset into the ROM, and it is up to the ROM code to return from that interrupt code. The CPU has no idea when the interrupt is finished. So the last instruction from our ROM function for an interrupt must be to return from it!

What might an interrupt series of byte codes in our ROM look like? That's up to you!

In the next post in our series I'm going to write some platform specific code to let us read characters from the keyboard, until I do that however, we'll just pretend an interrupt has been called...

Wednesday, 21 May 2014

Virtual CPU - Signed Addition & Endianess

From yesterdays post then, we should have learned something and perhaps even gone to look for a solution, you may have even coded a solution into the CPU code we're working on...

The solution I'm going with however is a total cheat, I'm going to add the Cout of the last adder back onto the result as a single full-adder...

Essentially we add the carry out into the first bit again. But we only want to do this when using a Signed value... so we'll cheat and use out "Signed" flag to do this new function or the original function... Lets get on with creating "AddTwoSignedBytes":

void Electronics::AddSignedTwoBytes (

byte& p_Register0,

byte& p_Register1,

byte& p_Result,

bool& p_Overflow,

const bool& p_Debug)

{

bool l_CarryIn = false;

bool l_CarryOut = false;

bool l_Sum = false;

// For each bit we need to mask the

// right most bit out of the register

// meaning, we start at 00000001 and

// for each loop move the register

// so the bit we're interested is over

// the 8th position.

// Our mask never changes

byte l_mask = 0x01;

// For each bit we run the masking

// then adder and handle switching

// the result into the register.

// You can find more efficient ways!

for (int i = 0; i < 8; ++i) // 8 bits in a byte

{

if ( p_Debug )

{

std::cout << "Cycle: " << i << std::endl;

std::bitset<8> msk { l_mask };

std::cout << "Mask: " << msk << std::endl;

std::bitset<8> reg0 { p_Register0 };

std::bitset<8> reg1 { p_Register1 };

std::cout << "Register 0 [" << reg0 << "]" << std::endl;

std::cout << "Register 1 [" << reg1 << "]" << std::endl;

}

// Get the A & B bits by shift & masking

// the register

bool A = ( ( ( p_Register0 >> i ) & l_mask) == 1);

bool B = ( ( ( p_Register1 >> i ) & l_mask) == 1);

// We have the carry in and the A & B now, so

// we can call the adder

// Because the Carry out, and the Sum, are separate

// in our code here, we don't need to alter "reg0" or

// "reg1", we can just logically add the bits set

// into the p_Result below!

Adder(A, B, l_CarryIn, l_CarryOut, l_Sum, p_Debug);

if ( p_Debug )

{

// This should be a value from our Adder trace table!

std::cout << "Adding: " << A << " " << B << " " << l_CarryIn << " | " << l_CarryOut << " " << l_Sum << std::endl;

}

// The carry out simply becomes the carry in

// I'm sure you can see one way to optimise this already!

l_CarryIn = l_CarryOut;

// Now the register change based on sum, but

// we also output the binary

if ( p_Debug )

{

std::bitset<8> resultBefore { p_Result };

std::cout << "Result Change: " << resultBefore << " -> ";

}

// Now the logic

// Now instead of pushing the logical

// summing into "Register0" parameter,

// we push it into the p_Result parameter!

if ( l_Sum )

{

// Mask is shifted, and always 1 in the i position

// so we always add a 1 back into the target

// register in the right location

p_Result = p_Result | ( l_mask << i);

}

else

{

// We know the mask is ON, so inversing it and moving it

// will give us an always off...

p_Result = p_Result & ~(l_mask << i);

}

// The register changed, so finish the debug statements

if ( p_Debug )

{

std::bitset<8> resultAfter { p_Result };

std::cout << resultAfter << std::endl;

}

//======================================

// Add the carry out to the first bit again

bool A = ( ( p_Result & 0x01) == 1);

// Take the first bit

Adder(A, l_CarryOut, 0, l_CarryOut, l_Sum, p_Debug);

// Now the logic

// Now instead of pushing the logical

// summing into "Register0" parameter,

// we push it into the p_Result parameter!

if ( l_Sum )

{

// Mask is shifted, and always 1 in the i position

// so we always add a 1 back into the target

// register in the right location

p_Result = p_Result | 0x01;

}

else

{

// We know the mask is ON, so inversing it and moving it

// will give us an always off...

p_Result = p_Result & ~0x01;

}

//======================================

// The final carry out becomes our

// over flow

p_Overflow = l_CarryOut;

}

So with this code, we need to test the function, lets add a new test function:

void Electronics::TestSignedAdd()

{

byte l_A = -127;

byte l_B = 7;

byte l_result = 0;

bool l_Overflow = false;

AddSignedTwoBytes(l_A, l_B, l_result, l_Overflow);

std::cout << "Testing signed add:" << std::endl;

int l_v = 0;

for (int i = 0; )

std::cout << "(" << (int)l_A << " + " << (int)l_B << ") = " << (int)l_result << std::endl;

}

Now, before we run the code, what do we expect to see?.. Well, we expect the value of -120, which has the the binary pattern 10001000.

Lets run the code and see...

What the hell just happened?... 129 + 7... That's not what our code says... and the answer is 136... What is going on!??!?!

Calm down, calm down, everything is fine... The binary pattern of the result is correct... see...

So what is with the values we see on the screen, if our register holds the binary pattern for -120, our result...!?!?!

Well, the signed binary for -120 is the same patter as the unsigned value 136! Its as simple as that, our CPU is working its our C++ which has thrown us a curved ball.

The cout stream took the byte we send and converted it for display, but the byte itself has no knowledge of signing, it is infact an unsigned char when we defined it as a type. So the binary might be perfectly fine, but the interpretation of that binary is wrong.

This is a case of being careful with how you test your code, and is an example where at least two tests are needed to confirm a result, never take the result of just one test as canonical. Always try to find some other way to test a value you calculate, or validate your input, or put a bounds around your system. Because when something goes wrong you always need to second check yourself before complaining to others.

In the case of this code cout is showing the unsigned values, and that's fine, we can ignore it because to get the true binary we can just use bitset...

#include <bitset>

std::bitset<8> l_binary (p_Result);

std::cout << l_binary << std::endl;

This is a lesson to learn in itself, always to check & recheck your code.

But now we have to think about the reprocussions for this in our CPU, even if we've set the signed flag the data are being stored unsigned, the memory is storing the values just as patterns of binary... And this is an important thing to keep in mind when you're programming, when you are working with a CPU, how the value is expressed is more important than how it is stored, because (hopefully) the binary representation is going to be the same all the time...

OR IS IT?

Unfortunately not, our binary we've dealth with so far is what we could call "Little-Endian" that is the lowest value assigned to a bit in the byte starts on the right, and we read the byte from right to left. Essentially the opposite way we would read this very English text...

If we read the byte the opposute wa around then the values would be reversed:

This is called Big-Endian.

Intel processors have pretty much always been little-endian, whilst other firms have used big-endian, notable platforms using big-endian processors are the Motorolla 680x0 family of CPU's. Yes, those of Atari ST's and Amiga's, the original Mac.. They all had bit-endian CPU's.

Some said this set a gulf between the two, and emulating between the two systems is very time consuming, because to emulate a big-endian processor on a little-endian machine used to mean a lot of overhead in converting between the binary representations.

Our CPU is going to suffer from this problem, because we've built it, and its adder to use little-endian principles, e.g. we start the adder loop from 0 to n-1, where as for a big-endian machine we'd want to start the adder loop at n-1 and go down to 0 to complete an addition.

A challenge would be to go back over our whole CPU and convert it for Endianess, making it a generic, configurable hardware implementation of a generic 8bit logical operating unit... I'm not going to do it, I'm just here to guide your experience.