Showing posts with label Assembler. Show all posts
Showing posts with label Assembler. Show all posts

Thursday, 31 January 2019

C++ Coding Standards: Don't Return Early

I've just had one of those mini-conversations which has be scratching my head, I asked a coworker about "return early" being part of the coding standard or not, as it was being discussed by other parties.

His response was an emphatic, slightly arrogant and touchily obtuse, "I have no time for anyone arguing for not returning early from a function you've discovered you should not be in".

I however don't take such a black and white view, I see the pro's and con's of both approaches, importantly not returning from a function can, and is part of, driving process flow conversations and it aids in picking out structure which can be encapsulated away.  The result being that instead of instantly exiting and potentially leaving spaghetti code you can see a flow, see where branching is occurring and deal with it...

But, that said, it must be understood "returning early" is often seen as "faster", as the chap argued "I see no point being in a function you know you should already have left", so I took to compiler explorer... In a very trivial example:

 

Here we see a decision being made and either a return or the active code, easy to read, simple, trivial... But don't dismiss it too early, is it the neatest code?

Well, we could go for less lines of code which generates near identical assembly thus:


This is certainly less lines of C++ however, inverting the test is a little messy and could easily be overlooked in our code or in review, better might therefore be expanding the test to a true opposite:


One could argue the code is a little neater to read without the inverting, critically though it has made no difference to the assembly output, it's identical.

And it is identical in all three cases.

You could argue therefore that "returning early has no bearing on the functionality of the code", but that's too simplistic, because "not returning early" also has no bearing on the functionality of the code.  The test and action and operation has been worked out by the magic of the compiler for us to be the same.

So with equivalent operation and generation we need think about what returning from the function early did affect, well it made the code on the left longer, yes this is an affectation of our coding with braces, but it was longer.  You could also see that there were two things to read and take in, the test was separate to the code and importantly for me the test and the actual functional code were on the same indentation horizontal alignment.  Being on that same alignment makes your eye not think a decision has been made.

Putting the test and the action of that test into the inner bracing communicates clearly that a decision has been made and our code has branched.

And when that structure is apparent you can think about what not returning early has done, well it's literally braced around the stanza of code to run after the decision, that packet of code could be spun off into it's own function do you start to think about encapsulation.  Of course you can think about the same thing after the return, but in my opinion having the content of the braces to work within is of benefit and most certainly does not afford any speed benefits.

Lets look at a more convoluted, but no less trivial example, a function to load a file in its entirety and return it as a known sized buffer... we'll start with a few common types:

#include <memory>
#include <cstdint>
#include <string>
#include <cstdio>
#include <optional>
#include <sys/stat.h>
#include <sys/types.h>
using byte = unsigned char;
using DataBlock = std::shared_ptr<byte>;
using Buffer = std::pair<uint32_t, DataBlock>;
using LoadResult = std::optional<Buffer>;

And we'll assume there's a function to get the file size, with stat...

const uint32_t GetFileSize(const std::string& p_Path)
{
    uint32_t l_result(0);

    struct stat l_FileStat;
    if ( stat(p_Path.c_str(), &l_FileStat) == 0)
    {
        l_result = l_FileStat.st_size;
    }

    return l_result;
}

Now, this file is return path safe because I define the result as zero and always get to that return, I could have written it thus:

const uint32_t GetFileSizeOther(const std::string& p_Path)
{    
    struct stat l_FileStat;
    if ( stat(p_Path.c_str(), &l_FileStat) != 0)
    {
        return 0;
    }

    return l_FileStat.st_size;
}

But I don't see the benefit, both returns generate an lvalue which is returned, except in the latter you have two code points to define the value, if anything I would argue you loose the ability to debug "l_result" from the first version, you can debug it before, during and after operating upon it... Where as the latter, you don't know the return value until the return, which results in the allocate and return.  And again in both cases the assembly produced is identical.

So, the load function, how could it be written with returning as soon as you see a problem?... Well, how about this?

LoadResult LoadFileReturning(const std::string& p_Path)
{
    LoadResult l_result;

    if ( p_Path.empty() )
    {
        return l_result;
    }
     
    auto l_FileSize(GetFileSize(p_Path));
    if ( l_FileSize == 0 )
    {                     
        return l_result;
    }

    FILE* l_file (fopen(p_Path.c_str(), "r+"));
    if ( !l_file )
    {
        return l_result;
    }

    Buffer l_temp { l_FileSize, std::make_shared<byte>(l_FileSize) };
    if ( !l_temp.second )
    {
        fclose(l_file);
        return l_result;
    }

    auto l_ReadBytes(
        fread(
            l_temp.second.get(),
            1,
            l_FileSize,
            l_file));

    if ( l_ReadBytes != l_FileSize )
    {
        fclose(l_file);
        return l_result;
    }

    l_result.emplace(l_temp);

    fclose(l_file); 

    return l_result;
}

We have six, count them (orange highlights) different places where we return the result, as we test the parameters, check the file size and then open and read the file all before we get to the meat of the function which is to setup the return content after a successful read.  We have three points where we must remember and maintain to close the file upon a problem (red highlights).  This duplication of effort and dispersal of what could be critical operations (like remembering to close a file) throughout your code flow is a problem down the line.

I very much know I've forgotten and missed things like this, reducing the possible points of failure for your code is important and my personal preference to not return from a function early is one such methodology.

Besides the duplicated points of failure I also found the code to not be communicating well, its 44 lines of code, better communication comes from code thus:

LoadResult LoadFile(const std::string& p_Path)
{
    LoadResult l_result;

    if ( !p_Path.empty() )
    {
        auto l_FileSize(GetFileSize(p_Path));
        if ( l_FileSize > 0 )
        {                        
            FILE* l_file (fopen(p_Path.c_str(), "r+"));
            if ( l_file )
            {
                Buffer l_temp { l_FileSize, std::make_shared<byte>(l_FileSize) };
                if ( l_temp.second )
                {
                    auto l_ReadBytes(
                        fread(
                            l_temp.second.get(),
                            1,
                            l_FileSize,
                            l_file));

                    if ( l_ReadBytes == l_FileSize )
                    {
                        l_result.emplace(l_temp);
                    }
                }                

                fclose(l_file);
            }
        }
    }

    return l_result;
}

This time we have 33 lines of code, we can see the stanzas of code indenting into the functionality, at each point we have the same decisions taking us back out to always return.  When we've had a successful open of the file we have one (for all following decisions) place where it's closed and ultimately we can identify the success conditions easily.

I've heard this described as forward positive code, you look for success and always cope with failure, whilst the former is only coping with failure as it appears.

I prefer the latter, ultimately it comes down to a personal choice, some folks argue indenting like this is bad, I've yet to hear why if the compiled object code is the same, you are communicating so much more fluently and have less points of possible human error in the code to deal with and maintain.

From the latter we could pick out reused code and we can target logging or performance metrics more directly on specific stanzas within their local scopes.  Instead of everything being against the left hand indent line.

Now, I will happily tell you it hasn't been a comfortable ride to my thoughts on this topic, I started programming on a tiny screen (40x20 characters - Atari ST Low Res GEM and when I moved to DOS having 80 character widths I felt spoiled.  Now we have tracts of huge screen space, arguing you need to stay on the left is moot, you don't, use your screen, make your code communicate its meaning, use the indents.

And yes, before you ask, I was doing things this way long before I started to use Python.

Monday, 26 March 2018

C++ : Pass-By-Reference Or Die

Before today's Post, I'm on a mission folks, to get 1000 subs on YouTube.  If only 5% of viewers here subscribed we've have met this target in one month...



I've just had group code review of one of my personal projects, and been rather surprised by the vitriol levelled at one of my practices.... Pass by Reference.

The reviewer, one of a group of peers, has had major issues with the project (my personal) insistance on passing by reference wherever possible, in C++ this takes the form of an additional ampersand on parameter definitions; maybe this was the chaps problem, he has to type an ampersand?

So his problem?  Well, without the actual code we'll simplify and use the Compiler Explorer (from Godbolt.org) and we'll take up their basic square function example, it starts up thus:

Giving the assembler:


On the right, and this chap had taken time to prepare a whole slide show of functions, usually simple, and present them at this code review, showing this kind of thing.  His point... Well the very same C++ but with a pass by reference:


Turns up more lines of assembler:


He's got me right, right, I'm taking more time, I'm slowing everything down, by my not taking a copy of everything and using less memory I'm slowing things down....

This is where the sort of power play turned, I allowed him to present everything, I never interjected, never spoke, I allowed him to speak to the whole group.  We've hired a venue for this, we're meeting live for the first time.  This has to be good.... A couple of the chaps who can already see the fault in the complainers logic were smirking, but we let him finish.

Triumphant, he has won the day, he will not carry the torch of coding standard gods...  WRONG.

I pulled over the presentation laptop, opened godbolt.org myself... Added the ampersand to the "num" and let it produce the above assembler... The chap was smirking completely from ear to ear, he knew he had me...

And then I typed three characters....

-O2

Yes, I told the compiler to optimize, and this happened...


Remarkably small code wouldn't you say?  I still haven't spoken, but I turn the laptop back to the presenter and just sit there.

There's a noticable snigger from those in the know, older-wiser heads then my own I hasten to add.  But this young chap is now looking from me to the screen to the overhead projection and back with a mix of fury and completely puzzlement, he'd checked everything, he's dotted every j and crossed every t, he had me down pat, he wanted to usurp me.

Except, he's never ever, been willing to listen, to learn or to experiment, "code runs, that'll do" is very much his style (and Kyle if you're reading, yes I'm talking about you) but getting code to run is not enough, understanding the code you've written is often only just enough, but getting it to run everywhere, the same way, that's an art.  Debug, Release, Optimised, Unoptimised, automatically profiled, link database and continune they're all subtly different.  Just listing one thing out, the only thing you've looked at; because it backs up your point of view; is not enough you have to look around and see the holistic picture.

And optimised without a pass-by-reference?


Spookily similar code in this case, but often times pass-by-reference is prefered, using const-correctness is prefered it communicates a meaning.

For instance in the "square" function above, how does the caller know that the parameter "num" is not altered in value?  How does the caller know it returns the new value only?  It could be returning an error status code and the parameter altered in value!  You don't know, but making the parameter const and a reference you start to communicate more firmly the intent of your code.

Wednesday, 27 December 2017

C/C++ Stop Miss using inline.... PLEASE!

This is a plea, from the bottom of my rotten black heart, please... Please... PLEASE stop miss using the inline directive in your C and C++.

Now, I can't blame you for this, I remember back in the 90's being actually taught (at degree level) "use inline to make a function faster", and this old lie still bites today.

inline does not make your function faster, it simply forces the compiler to insert "inline" another copy of the same code whenever you call it, so this code:

#include <iostream>

inline void Hello()

{
    std::cout << "Hello";

}

int main ()
{
    Hello();
    Hello();
    Hello();
}

Turns into the effective output code of:

int main ()

{
    std::cout << "Hello";

    std::cout << "Hello";
    std::cout << "Hello";
}

What does this mean in practice?  Well, you saves yourself a JMP into the function, and the position on the stack holding the return address, and the RET from the function which pops off the stack and returns from the function.

This is WHY people were told to use inline to make things faster in the 90's, I was taught this when I a system with around 254K of working RAM for the programs I was writing, saving that space on an 8K stack was important in complex systems, especially if you were nesting loops of calls.

However, today, on a modern processor, even modern embedded processors, DO NOT DO THIS!

You're no longer saving anything, you're in fact making your code bigger and slower as suddenly your program expands in size and you are having to fetch more and more from the slower RAM layers rather than the program instructions page fitting into the lower CACHE layers.

As you get page misses you fetch more, you literally stop the program and switch context to another item and then switch back, literally halting your program in its tracks as it suddenly had to go load the N'th of possibly thousands of repeated stanza's of code.

Don't do, this, don't lumbar yourself, let the compiler handle it's own optimizations, they're pretty good at it!

Now some of you will be saying "yeah, no shit Xel, what's your point?"... My point is I recently had around 4000 lines of code handed to me, a huge long listing, and around 40% of it was a series of functions.  This whole thing could compile down to around 62K.... But when compiled it was just over 113K... This was too big to fit into the memory of the micro-controller it was for.

The developer had been working merrily over the yule tide, happy and satisfied their code would work, they went to work this morning and instead of running the code on the IDE within an emulator, they actually ran it on the metal.

It crashed, and they couldn't figure out why, the size was why.

And then they couldn't work out why the code was so big... It is tiny code.

They came, cap in hand, to myself - and I took no small satisfaction in rolling my eyes and telling them to remove the "inline" from EVERY function... "But it'll run so slowly" they decried... "REMOVE THE INLINE".

Of course it works, they have the system fitting into the micro-controller RAM, the stack is working a lot harder, their code is a lot smaller, and they are now in possession of a more balanced opine on "inline".

* EDIT *

One person, yes hello Hank, asked me "why", why was this a not a problem on the emulator, but was a problem on the bare metal, well the bare metal was using a different compiler than the pseudo compiler for the windows based IDE, the Windows based IDE was actually running the code through a compiler which ignored "inline", and so produced code a little like this:

(Image Courtesy "CompilerExplorer")

You can see that even though "int square(int)" is marked "inline" it contains the push to the stack and the "pop ret" pairing, and making it a call from main results in two function calls to the same assembler.

The bare metal compiler did not, an undocumented difference I might add.

Friday, 2 December 2016

Code History : Old Computers to Teach New Developers

A fair while ago, I posted about my Virtual CPU code, that code (still not complete) just does addition (and my association subtraction), the point was to lead into code which emulated actual logic gates and indeed I write (somewhere in these pages) a half-adder followed by a full-adder emulation logic in C++ code.

The purpose of this was actually related to my giving a talk to some school children, it was ever intended for these pages, the kids could not wrap their heads around not being able to "just multiply".

They had studied computer memory, and wrote documents in word bigger than the actual memory foot-print of the machines they were talking about.

The teacher, a friend of mine, wanted to demonstrate this to them.... I therefore hoved into view with my 12Kbyte Commodore 16... And challenged the kids to write a program for it... They then had the massive struggle... One bright young chap found a C16 emulator online, and he wrote out a long program in Commodore Basic, which amounted to little more than a text-editor...

It was very impressive work from the young chap, and I've kept his number for when he graduates (in 2021), unfortunately it worked fine on the emulator, as you could assign more memory to the machine... After typing the program into the actual machine... It ran out of memory!  The edit buffer was only 214 bytes...

He had only tested with "10 print 'hello' 20 goto 10", but typing any lengthier program in and it essentially started to overwrite the previous lines of code.

You might call this an oversight, but it was semi-intentional as after all the project was about memory.

So having learned how expensive and precious memory was, and is, in this world of near unlimited storage the kids moved onto assembly, learning about how the lowest level of a machine worked.

This is where my work came into help, because they could not wrap their heads around "not multiplying".  In some chips a multiplication call might be many thousands of gates ("to multiply was 200,000 gates, over half the chip" - 3Dfx Oral History Panel - See video below).


Hence I wrote the code of a CPU only to do addition, to multiple one had to write assembly which did a loop of additions!  I left the kids to write the assembly for division, which is why you never see it here in my code.

It worked, but I find so many commentators have missed this piece of computing history, have missed that machines used to NOT come with every function, you had to emulate some functions in software with the bare-set of commands present.

Some have confused this with the idea of RISC, this isn't what RISC is about, but I distinctly remember being taught about the RISC based machines at school (Acorn machines) and that RISC meant there were "less instructions".  Sofie Wilson herself tells us that this isn't the point of RISC...


Having just spoken to my friend about the possibility of presenting these ideas again to a class of kids, I'm wondering whether I need to clean up all my code here, to actually make-sense of all these disparate and separate sources and write one paper on the topic; histories of this which are readable by kids seem to be sadly lacking, or they start from the point of view of a child of my time, born in the late 1970's whom remembers a time when you have limits in computing.

Kids today see no such limits, and find it hard to relate to them, my own niece and nephews, whom are just turning 15, find it hard to fathom such limits, even when they can sit down with me in front of a 12K machine, or a 512K machine, they can't relate, these pieces of history, these things which previously one had to work around are alien to them.

They don't need to work around them, and this leaves me meeting modern graduates whom lack some of the lowest level debugging and problem solving skills.

Indeed, I see these great efforts to publish frameworks to make modern developers test and think about software, because new developers have never had to get things right the first time...

I did, I'm one of the seeming few, who had to get it right and first time.  This is not bragging, it's actually quite sad, as how do you prove your code is good?  Pointing to a track record counts for very little, especially when the person you are trying to persuade has no interest in you, just your skills.

My most recent anathema, Test Driven Development, seems to be the biggest carbuncle in this form of "modern development"... Write me some code... They might ask, and you can write the code, and show it's correct, or you can write the tests which test the range, the type, the call, the library... Then write the code?... One is quick, efficient, but requires faith in the developer... One is slower, aims to forge faith of the code out of the result... Both end up with the same result, but one requires a leap of faith and trust.

Unfortunately, bugs in code, over the history of development have destroyed that faith in the developer.  There are a precious few developers whom are trusted to act on their own initiative any longer.  I know I work in a part of my employers company where I am trusted to act on my own initiative; with temperance that I have delivered very many years of good working products.

But I'm seeing, and hearing, of so many other parts of companies around us which do not trust their developers, and I would argue, if these developers had had to struggle with some of the historical problems my own generation of developers had struggled with, then they would be trusted more, and be freer to act, rather than being confined and held-back by needing to check, recheck and double check their code.

Trust in one another, a peer review, and where necessary a sentence of text on the purpose of some function or other, should really define good development, the code itself can tell you it's purpose, not the tests, certainly not by just running the code employing observation.

I therefore propose I get off my pontificating bum, clean up all my "virtual CPU" stuff, document some of these issues, and we as part of the development community try to get new developers to challenge themselves against Old Computers... ComputerPhile already demonstrate this with their Crash Bug examples with the Atari ST...


Thursday, 24 May 2012

Writing an Assembler for our Virtual CPU in C++

In my previous post we discussed creating a Virtual CPU in software, today we'll go a little further and create the assembler application, and expand the CPU a little to execute the binary output of that assembler application.


In my very popular post of yesterday, regarding writing ones own virtual CPU in C++ code we speculated about a virtual instruction set, and talked about the asembly language such a virtual machine, and real machines, would use to be instructed by us mere mortals and briefly touched on the topic of assemblers and compilers.

Today, we're going to go a step further with our discussion of the assembler software and the assembly language our virtual CPU uses, so we get a brief insight into how an assembler can be made to work.  This will take the form of a complete code example of an assembler to take assembly instructions and turn them into valid code for our virtual CPU, as well as some additions to the virtual CPU application so we can save our assembler output as binary and then run it through the virtual CPU... Making our little virtual CPU a proper programmable device!

Preprocessing
First things first we need to write our assembler program, to do this we need to understand how an assembler works, how compilers work in general.  The first step of such software is to take the code given it by the user and to homogenise it for processing.  This means we might strip white space, we might check for the inclusion of other files and generate a large conglomerate of numerous code files all together as a single unified body, either on disk or in RAM.  For our assembler we're going to perform a pre-parsing step just like this to prepare the code, this will load the whole line as a set of lines (remember we can only have 256 instructions because of the 8 bit width of our CPU, so there won't be that much to hold in RAM) and we're going to convert each line into uppercase characters and then trim any white space off of the front and back of each line.

So, code like this:

load0 3
   load1 17
add

Will end up like this:

LOAD0 3
LOAD1 17
ADD

Lexical Analysis
The next step is to analyse each line for validity, we need to make sure each line has the right number of parameters and is an instruction we understand.  Most importantly we need to tell the user if its not!

Our lexical analysis is simple, for each line validate what the programmer asks is valid for the processor to carry out.

Assembling
Finally we know the code is valid and formatted right, we can now go a head and convert the human readable assembly language into assembled instructions, so we translate each byte (you see now why we wrote such a simple CPU) and store them in a linear vector, just like the vector we pass through the CPU execute function.

Full Source Code of our Assembler

#include <fstream>
#include <string>
#include <iostream>
#include <sstream>
#include <algorithm>
#include <cstdio>
#include <string.h>
#include <vector>
#include "cpu.hpp"

namespace Assembler
{
using namespace std;

using namespace Emulator;

typedef unsigned char byte;

class Assembler
{
private:
static const string LOAD0;
static const string LOAD1;
static const string STORE0;
static const string STORE1;
static const string ADD;
static const string SUBTRACT;
static const string STORETO;

static const string whitespaces;

void trimRight (string& str, const string& trimChars = whitespaces)
{
   string::size_type pos = str.find_last_not_of( trimChars);
   str.erase( pos+1 );
}

void trimLeft (string& str, const string& trimChars = whitespaces)
{
   string::size_type pos = str.find_first_not_of( trimChars );
   str.erase(0, pos);
}

void trim (string& str, const string& trimChars = whitespaces)
{
   trimRight(str);
   trimLeft(str);
}

vector<byte> m_Inst;
vector<string> m_Code;

bool m_Error;

string m_LastError;

const vector<string> Split (const string& p_String)
{
vector<string> result;

string temp;
istringstream stm (p_String, istringstream::in);
while ( stm.good() )
{
stm >> temp;
result.push_back(temp);
}

return result;
}

void ToUpper (string& p_String)
{
transform (p_String.begin(), p_String.end(), p_String.begin(), ::toupper);
}

void LoadCode (const string& p_Filename)
{
m_Code.clear();

string tmp;
tmp.resize(256);

ifstream file(p_Filename.c_str(), ios_base::in);
while ( file.good() )
{
   memset(&tmp[0], 0, 256);
file.getline (&tmp[0], 256);
trim(tmp);

                    string news (tmp.c_str());
if ( news.length() > 0 )
{
                        m_Code.push_back(news);
}
}
file.close();
}

void PreprocessCode ()
{
vector<string>::iterator itr;
for (itr = m_Code.begin(); itr != m_Code.end(); itr++)
{
ToUpper((*itr));
}
}

void ListCode ()
{
   cout << "----------" << endl;
vector<string>::const_iterator itr;
for (itr = m_Code.begin(); itr != m_Code.end(); itr++)
{
cout << (*itr) << endl;
}
cout << "----------" << endl;
}

void ListInstructions ()
{
   cout << "Instructions Generated:" << endl;
   vector<byte>::const_iterator itr;
   for (itr = m_Inst.begin(); itr != m_Inst.end(); itr++)
   {
       cout << (int)((*itr)) << endl;
   }
}

void Assemble ()
{
   cout << "Assembling " << m_Code.size() << " lines of code" << endl;

int lineNumber = 0;
vector<string>::const_iterator itr;
for (itr = m_Code.begin(); itr != m_Code.end(); itr++)
{
lineNumber++;

                    cout << lineNumber << ". [" << (*itr) << "]" << endl;

vector<string> split = Split((*itr));

Translate (split, lineNumber);

if ( m_Error )
{
   cout << "Error, aborting Assemble" << endl;
   break;
}
}
}

void TranslateError (const vector<string>& p_Line, const int &p_LineNumber)
{
cout << "Assemble [Translate] Error at [" << p_LineNumber << "]" << endl;
cout << "Message: " << m_LastError << endl;
m_Error = true;
}

const bool ProcessLoad0 (const vector<string>& p_Line)
{
bool l_result = true;
if ( p_Line.size() == 2 )
{
try
{
if ( p_Line[0] == LOAD0 )
{
   string tmp = p_Line[1];
int value = stoi(tmp);
if ( value >= 0 && value <= CPU::MAX )
{
m_Inst.push_back(CPU::LOAD0);
m_Inst.push_back((byte)value);
l_result = false;
}
else
{
m_LastError = string ("LOAD0 Parameter out of range, must be (0 - 255)");
}
}
else
{
m_LastError = string("Assembler Error; Trying to translate line as LOAD0, when its not LOAD0?");
}
}
catch (exception& l_e)
{
m_LastError = string("Error Processing Load0 Instruction");
}
}
return l_result;
}

const bool ProcessLoad1 (const vector<string>& p_Line)
{
bool l_result = true;
if ( p_Line.size() == 2 )
{
try
{
if ( p_Line[0] == LOAD1 )
{
   string tmp = p_Line[1];
int value = stoi(tmp);
if ( value >= 0 && value <= CPU::MAX )
{
m_Inst.push_back(CPU::LOAD1);
m_Inst.push_back((byte)value);
l_result = false;
}
else
{
m_LastError = string ("LOAD1 Parameter out of range, must be (0 - 255)");
}
}
else
{
m_LastError = string("Assembler Error; Trying to translate line as LOAD1, when its not LOAD1?");
}
}
catch (exception& l_e)
{
m_LastError = string("Error Processing Load1 Instruction");
}
}
return l_result;
}

const bool ProcessAdd (const vector<string>& p_Line)
{
   bool l_result = true;
                if ( p_Line.size() == 1 )
                {
                    if ( p_Line[0] == ADD )
                    {
                        m_Inst.push_back(CPU::ADD);
                        l_result = false;
                    }
                    else
                    {
                        m_LastError = string ("Error processing Add instruction");
                    }
                }
                else
                {
                    m_LastError = string("Process Add had nothing to do");
                }
   return l_result;
}

const bool ProcessSubtract (const vector<string>& p_Line)
{
   bool l_result = true;
   if ( p_Line.size() == 1 )
   {
       if ( p_Line[0] == SUBTRACT )
                    {
                        m_Inst.push_back(CPU::SUBTRACT);
                        l_result = false;
                    }
   }
   else
   {
       m_LastError = string("Process Subtract had nothing to do");
   }
   return l_result;
}

const bool ProcessStore0 (const vector<string>& p_Line)
{
                bool l_result = true;

                if ( p_Line.size() == 1 )
                {
                    if ( p_Line[0] == STORE0 )
                    {
                        m_Inst.push_back(CPU::STORE0);
                        m_Inst.push_back(CPU::MAX);
                        l_result = false;
                    }
                }
                else
                {
                    m_LastError = string("Process Store0 had nothing to do");
                }
                return l_result;
}

const bool ProcessStore1 (const vector<string>& p_Line)
{
   bool l_result = true;
   if ( p_Line.size() == 1 )
   {
                    if ( p_Line[0] == STORE1 )
                    {
                        m_Inst.push_back(CPU::STORE1);
                        m_Inst.push_back(CPU::MAX);
                        l_result = false;
                    }
   }
   return l_result;
}

const bool ProcessStoreTo (const vector<string>& p_Line)
{
   bool l_result = true;
   if ( p_Line.size() == 3 )
   {
                    try
                    {
                        string tmp1 = p_Line[1];
                        string tmp2 = p_Line[2];
                        int reg = stoi (tmp1);
                        int var = stoi (tmp2);
                        if ( reg == 0 || reg == 1 )
                        {
                            if ( var >= 0 || var <= 3 )
                            {
                                m_Inst.push_back(CPU::STORETO);
                                m_Inst.push_back((byte)reg);
                                m_Inst.push_back((byte)var);
                                l_result = false;
                            }
                            else
                            {
                                m_LastError = string("Error in StoreTo instruction, invalid target Variable.  Only 0, 1, 2 or 3 allowed!");
                            }
                        }
                        else
                        {
                            m_LastError = string("Error in StoreTo instruction, invalid register.  Only 0 or 1 allowed!");
                        }
                    }
                    catch (exception& l_ex)
                    {
                        m_LastError = string ("Error converting StoreTo Parameters");
                    }
   }
   else
   {
       m_LastError = string("Error in StoreTo Parameters");
   }
   return l_result;
}

void Translate (const vector<string>& p_Line, const int& p_LineNumber)
{
bool error = true;
size_t sze = p_Line.size();
if ( sze > 0 )
{
if ( p_Line[0] == "" )
{
   error = false;
}
else if ( p_Line[0] == LOAD0 )
{
error = ProcessLoad0 (p_Line);
}
else if ( p_Line[0] == LOAD1 )
{
error = ProcessLoad1 (p_Line);
}
else if ( p_Line[0] == ADD )
{
error = ProcessAdd (p_Line);
}
else if ( p_Line[0] == SUBTRACT )
{
error = ProcessSubtract (p_Line);
}
else if ( p_Line[0] == STORE0 )
{
error = ProcessStore0 (p_Line);
}
else if ( p_Line[0] == STORE1 )
{
error = ProcessStore1 (p_Line);
}
else if ( p_Line[0] == STORETO )
{
error = ProcessStoreTo (p_Line);
}
else
{
   cout << "Unknown Instruction [" << p_Line[0] << "]" << endl;
m_LastError = string("Unknown Instruction");
}
}
else
{
m_LastError = string ("Nothing to do");
}

if ( error )
{
TranslateError (p_Line, p_LineNumber);
}
}

void OutputInstructions ()
{
   ofstream file ("asmout.bin", ios_base::out | ios_base::binary);
   vector<byte>::const_iterator i;
   for (i = m_Inst.begin(); i != m_Inst.end(); i++)
   {
       file.write(reinterpret_cast<const char*>(&(*i)), sizeof(byte));
   }
   file.close();
}

public:

Assembler (const string& p_Filename)
:
m_Error (false),
m_LastError (string(""))
{
LoadCode (p_Filename);
PreprocessCode ();
Assemble();
ListInstructions();
OutputInstructions();
}

~Assembler ()
{
}
};

const string Assembler::LOAD0 = string("LOAD0");
const string Assembler::LOAD1 = string("LOAD1");
const string Assembler::STORE0 = string("STORE0");
const string Assembler::STORE1 = string("STORE1");
const string Assembler::ADD = string("ADD");
const string Assembler::SUBTRACT = string("SUBTRACT");
const string Assembler::STORETO = string("STORETO");
const string Assembler::whitespaces = string("\0 \f\n\r\t\v");
}


Advanced Assembling
We're not going to go further than this with our example of an assembler, but you need to know that this is a very rudimentary example, assembling a single file into a single stream of instructions... In a future post we're going to look at defining more advanced structures for our virtual CPU, and therefore expand our assembler.

Alterations to our Virtual CPU
Below is a set of changes to the code of the Virtual CPU, go a head and take a look.  We add a new piece to the main function of "RunCPU" so we can pass in a file to execute, this file will be the binary output of our assembler.  And we invent the idea of the CPU having an amount of memory available.  I'm modelling this memory in the CPU as a set of bytes, which can be assigned as variables VAR0 VAR1 VAR2 VAR3 in our assembler.

In other CPU's these bytes might be allocated in the main system memory, or RAM, and the addresses passed to the CPU so it knows where to store the values!  Other CPU's might do this by loading the target memory address into a register, the value to store into another register and being told to store the value off to that location.  In our virtual CPU what we have altered is the instruction set, the old STORE0 and STORE1 instructions remain, we can't change them incase we've got a program already using them, so instead we created a new instruction STORETO this takes two parameters, the first is which register we're going to store to somewhere (so a byte of 0 or 1 for register0 and register1 respectively) and the second parameter is the location we're going to store to (so 0, 1, 2, 3 to represent VAR0 VAR1 VAR2 and VAR3).

Below is the full code for the CPU, but I will highlight the changes in bold.  However, the most important items are within the new function "DoStoreTo", because we want to store the value of a register to somewhere, we need to use the other register to let the CPU remember what we're doing.  This is the burden an assembly language user carries, they must coax out of quite simple instructions very complex results.  In this example we need to get the next byte off of the program listing so we know which register is going to be saved to, but without destroying the value in either register... I can't just invent a new location for this value to go into, so I need to find somewhere on the CPU it can go.  We can't use the VAR's there might already be values in them, I can't use the instruction register or the program counter... I do have the temporary register however, so I'm going to load the first parameter into the temporary register and step the program counter, but we also need the next parameter, so we know the target location... Ahaaa, temporary is bigger than a byte... if it can hold two we can manipulate it to do the work for us, to store two bytes at a time!

You'll therefore see in the code that the temporary value is loaded with the next two bytes and then we mask off the temporary structure with some binary logic masks - CPU's are very good at using masks - to let the CPU decide what to do for us... (Yes, I know every electrical engineer reading this has just closed their web browser, as this is all rather abstract and all too much cheating, but since we're just demonstrating the principle of assembler, we're doing it this way - despite the fact that this is not actually how a CPU would do things, we'll get to that far more complex stuff much later).


#ifndef CPU_HEADER
#define CPU_HEADER

#include <iostream>
#include <vector>

namespace Emulator
{
using namespace std;

typedef unsigned char byte;

class CPU
{
public:

static const byte MAX;

enum InstructionSet
{
LOAD0 = 0,
LOAD1,
ADD,
SUBTRACT,
STORE0,
STORE1,
STORETO
};

private:

byte m_Register0;
byte m_Register1;

byte m_VAR0, m_VAR1, m_VAR2, m_VAR3;

bool m_Status;
bool m_Overflow;
bool m_Underflow;

byte m_ProgramCounter;
byte m_InstructionRegister;

int m_Temp;

CPU (const CPU&) {}

void ResetCPU ()
{
m_Temp = 0;
m_Register0 = 0;
m_Register1 = 0;
m_Status = true;
m_Overflow = false;
m_Underflow = false;
m_ProgramCounter = 0;
}

void Fault ()
{
m_Status = false;
cout << "Instruction Fault at: " << static_cast<int>(m_ProgramCounter) << endl;
DumpRegisters();
}

void DumpRegisters ()
{
cout << "CPU Registers:" << endl
<< "Register0 [" << static_cast<int>(m_Register0) << "]" << endl
<< "Register1 [" << static_cast<int>(m_Register1) << "]" << endl
<< "Status [" << m_Status << "]" << endl
<< "Overflow [" << m_Overflow << "]" << endl
<< "Underflow [" << m_Underflow << "]" << endl
<< "Program Counter [" << static_cast<int>(m_ProgramCounter) << "]" << endl
<< "Instruction Register [" << static_cast<int>(m_InstructionRegister) << "]" << endl
<< "Temp [" << m_Temp << "]" << endl;
}

void DoLoad0 (const vector<byte>& p_Program)
{
m_Register0 = p_Program[m_ProgramCounter];
m_ProgramCounter++;
}

void DoLoad1 (const vector<byte>& p_Program)
{
m_Register1 = p_Program[m_ProgramCounter];
m_ProgramCounter++;
}

void DoAdd ()
{
m_Temp = m_Register0 + m_Register1;
if ( m_Temp > MAX )
{
m_Overflow = true;
m_Temp = MAX;
}
m_Register0 = m_Temp;
}

void DoSub ()
{
m_Temp = m_Register0 - m_Register1;
if ( m_Temp < 0 )
{
m_Underflow = true;
m_Temp = 0;
}
m_Register0 = m_Temp;
}

void DoStore0 (vector<byte>& p_Program)
{
p_Program[m_ProgramCounter] = m_Register0;
m_ProgramCounter++;
}

void DoStore1 (vector<byte>& p_Program)
{
p_Program[m_ProgramCounter] = m_Register1;
m_ProgramCounter++;
}

void DoStoreTo (const vector<byte>& p_Program)
{
m_Temp = 0;
m_Temp << p_Program[m_ProgramCounter];
m_ProgramCounter++;
if ( m_Temp & 0x00 )
{
m_Temp = 0;
m_Temp << p_Program[m_ProgramCounter];
if ( m_Temp & 0x00 )
{
m_VAR0 = m_Register0;
}
else if ( m_Temp & 0x01 )
{
m_VAR1 = m_Register0;
}
else if ( m_Temp & 0x02 )
{
m_VAR2 = m_Register0;
}
else if ( m_Temp & 0x03 )
{
m_VAR3 = m_Register0;
}
else
{
Fault();
}
}
else if ( m_Temp & 0x01 )
{
m_Temp = 0;
m_Temp << p_Program[m_ProgramCounter];
if ( m_Temp & 0x00 )
{
m_VAR0 = m_Register1;
}
else if ( m_Temp & 0x01 )
{
m_VAR1 = m_Register1;
}
else if ( m_Temp & 0x02 )
{
m_VAR2 = m_Register1;
}
else if ( m_Temp & 0x03 )
{
m_VAR3 = m_Register1;
}
else
{
Fault();
}
}
else
{
Fault();
}
m_ProgramCounter++;
}

public:

CPU ()
{
ResetCPU();
}

~CPU ()
{
}

void Execute (vector<byte>& p_Program, const bool& p_HaltOnOverflow = true, const bool& p_HaltOnUnderflow = true)
{
ResetCPU();

if ( p_Program.size() == 0 )
{
cout << "No Instructions!" << endl;
return;
}

if ( p_Program.size() > MAX )
{
cout << "Error: Unable to process program, more than 255 instructions" << endl
<< "this is more than the Program Counter can handle" << endl;
}
else
{
while ( m_ProgramCounter < p_Program.size() )
{
m_InstructionRegister = p_Program[m_ProgramCounter];
m_ProgramCounter++;

switch (m_InstructionRegister)
{
case LOAD0:
DoLoad0 (p_Program);
break;
case LOAD1:
DoLoad1 (p_Program);
break;
case ADD:
DoAdd ();
break;
case SUBTRACT:
DoSub();
break;
case STORE0:
DoStore0(p_Program);
break;
case STORE1:
DoStore1(p_Program);
break;
case STORETO:
DoStoreTo (p_Program);
break;
default:
Fault();
return;
}

if ( m_Overflow && p_HaltOnOverflow )
{
cout << "Overflow - Halt" << endl;
return;
}

if ( m_Underflow && p_HaltOnUnderflow )
{
cout << "Underflow - Halt" << endl;
return;
}
}
}
}

};

const byte CPU::MAX = 255;

}

#endif

To compile and use these classes I then have two files, "RunCPU.cpp" and "RunAsm.cpp", the code for which are:

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include "cpu.hpp"

using namespace std;

using namespace Emulator;

int main (const int argc, const char* argv[])
{
CPU* cpu = new CPU ();
cout << "CPU Ready..." << endl;

vector<byte> inst;

if ( argc == 2 )
{
cout << "Loading Instructions...";
ifstream ifs(argv[1], ifstream::in|ifstream::binary);

int count = 0;
while ( !ifs.eof() && ifs.good() )
{
byte temp = (byte)ifs.get();
if ( !ifs.eof() )
{
count++;
inst.push_back(temp);
}
}

ifs.close();

cout << "Complete [" << count << "]" << endl;
}

cout << "Starting Execution..." << endl;
cpu->Execute(inst);

cout << "Execution Complete" << endl;

vector<byte>::const_iterator itr = inst.begin();
for ( ; itr != inst.end(); itr++)
{
cout << "[" << static_cast<int>((*itr)) << "]" << endl;
}
cout << "Complete" << endl;

delete cpu;

return 0;
}

And:

#include <string>
#include "Assembler.hpp"

using namespace std;

int main (const int argc, const char* argv[])
{
if ( argc == 2 )
{
Assembler::Assembler* asmb = new Assembler::Assembler (string(argv[1]));
delete asmb;
}
else
{
cout << "Usage: " << endl
<< "\tRunAsm <Filename>\t\tWhere <Filename> is the assembly text file." << endl;
}

return 0;
}

Respectively, with these we can compile the code with:

g++ -std=c++0x -I~/ ~/RunCPU.cpp -o runcpu.o
g++ -std=c++0x -I~/ ~/RunAsm.cpp -o runasm.o

I can then assemble and run my code, which is whatever I want from our lexicon of commands, but for this demo its:

load0 1
load1 2
add
load1 1
subtract

Which I have saved as "Code.asm" in the home directory, so to now compile and run this assembly on our virtual CPU I perform:

~/runasm.o ~/Code.asm
~/runcpu.o ~/asmout.bin

Where "asmout.bin" is the output binary file from our written compiler...



Your CPU should execute your Assembly code now, and you should be able to re-run the CPU with each output... Try more assembly code sets... and remember to intentionally crash the CPU, make things go negative...