In March I received an email from my old University (The University of York) out of the blue asking if I would be willing to accept a nomination for the Duke of York Young Entrepreneur Award. I was really surprised to be chosen, especially as I was now a graduate!
Receiving my award from the Duke of York
The ceremony itself was in Huddersfield and was really exciting: His Royal Highness took an incredible interest in the businesses of all the winners and it was great to hear what others were up to.
Since I was in the Yorkshire area, I took the opportunity to visit my old University and meet with some of the Entrepreneurs currently working with the Enterprise Department which was great fun.
Following on from my work in Sudan last year, yesterday I was given the opportunity to hold a workshop for the three winners of the Mashrouy programme; Houida, Abrar and Maha.
The three were over as part of the 2-week long trip that winners of the Mashrouy programme receive, which as well as a tour of the UK, includes workshops on business practice.
Mashrouy Winners: Houida, Abrar and Maha, after my workshop on tracking business progress and utilising mentors
Once again I was struck by the innovativeness of the businesses and how different they were to those in England. The businesses are, from left to right: production of an organic tick-repellant to reduce illness in cattle from local natural resources; software for analysing mammograms with lower cost and higher accuracy; and a substance for tanning leather extracted naturally from tree bark, reducing use of imported chrome and the environmental impact of chrome.
Speaking about any aspect of business when the markets are so different is tough. I was asked to speak about tracking business progress using KPIs and OKRs (a topic we went into great detail on during Techstars) and then on how to utilise mentors and QA automation services. All candidates in Mashrouy work with mentors, but getting the most out of a mentor is a process that requires some planning. We discussed how to write short, punctual emails with clear and concise points and ‘asks’, and the best way to ask for an introduction to somebody that your mentor knows (with the help of a post by Russell Buckley).
A couple of weeks back I was really lucky to have the opportunity to speak at UCL on the topic of “How to Start a Startup” to an audience of a few hundred. The night before my lecture, I met up with a close friend who challenges me regularly, who wanted to challenge me on a simple point that we talk about often:
Never be embarrassed
Apparently I was getting too ‘comfortable’. I had a pink bandana (a long and different story) and she challenged me to wear it for my lecture and to incorporate it to make a serious point. I did (see below – the picture was shared quite widely after my talk…).
Me lecturing in a pink bandana
But, besides embarrassing myself, what was the point? In startups, you end up doing a whole bunch of things you didn’t expect to when you started out. Last year whilst acting as CTO of pingWHEN, I was surprised by the number of roles I had to fill. Sure, developing the product (a backend application, database architecture, REST API and various mobile app frontends) was a big part, but it probably wasn’t more than 50% of my working hours. The remainder included conversations with our investors and mentors, customer development and administration stuff. But that wasn’t it: as part of the construction of our company, I had to:
Do two Karate take-downs on video
Make terrible visual effects in a product video
Act in a product video
Take photos out in York at 1am in the freezing cold trying to get rid of light from a street bollard with a coat.
Pitch to senior executives at Disney
Attempt to sing (terribly).
The point is, if you hold yourself back, restricting yourself to things you feel comfortable with, you might not fulfil your potential or achieve what you ned to achieve. If you want your startup to succeed, you’ll need to do a few things that make you uncomfortable.
Our compiler is open-source, you can get the code here on GitHub.
Introduction – project background and groundwork
It’s been a little while since I’ve blogged about the UniCPU team project – which could be perceived as meaning that we’ve dropped the project. Don’t worry, we haven’t!
Since my last post, we’ve (we = Andrei) continued to make progress on the hardware. Most of the recent work is just wiring up the various units to the multiplexer and wiring up the ROM, all of which is very simple and time consuming. Certainly not worthy of a video.
Whilst Andrei was making progress on the hardware, I got started on writing the compiler (don’t worry, if you don’t know what an compiler is, it’s explained below) for the instruction set we’re supporting. This came at a really timely moment, since I’d just finished studying a module on compilers at University (SYAC, at York), so I had the chance to put the theory into practise – but in a simplified form (since, in my view at-least, there’s no point wasting 10’s of hours writing a super flexible compiler when a simple one will do more than we need).
Before we get into a discussion on the actual compiler, there’s an important question: what is a compiler? Put simply, a compiler is a program that transforms code written in one language to code in another language. It’s typically the thing that transforms code in a high-level language (for example, Java or C) to machine code (a language that can be understood by the CPU in your computer).
STAGE 0 – Background: our language
The UniCPU language is one the simplest languages – it’s simply an instruction set for the CPU with only one feature – code labels (for easy jumping between sections). Code in the UniCPU language takes the form of an instruction/opcode, and then if an operand (parameter) is necessary for the instruction, a space followed by an operand:
[INSTRUCTION] [OPERAND (IF NECESSARY)]
For example, if I wanted to write a program that added the number 5 to a value input on the input board (think of it as the keyboard), I’d write:
INPUT // Load value from the input panel into the accumulator
ADD 5 // Add 5 to the value in the accumulator.
( // Represents comments. Anything following “//” is ignored by the compiler until the line end).
Truncated Instruction Set Dictionary
If you remember from our earlier posts, everything within our CPU is represented as single bytes (i.e. an instruction is 8 bits, all operands/values are 8 bits). This means that our instruction memory is always expecting to receive two bytes per instruction – one for opcode and one for the operand – which will be nulled to 00000000 if the opcode doesn’t require a parameter (for example, INPUT). This means we need to have a binary representation for each opcode – which we have. A truncated representation of this is displayed on the right, but you can view the full logic behind this in a spreadsheet located here.
This means that the desired representation of the above program is:
00000000 00000000
00000100 00000101
So, first we put the opcode for INPUT in, then the null operand (since it doesn’t need a parameter). Then we put the opcode in for ADD, followed by the binary representation of the parameter (5 = 0b101). Giving us the above block of binary.
Another feature of most programming language is the ability to jump from one section of code to another without having to use line numbers. This is because using line numbers is incredibly difficult to maintain – say your application jumps back to line 17 regularly, and then you add an extra instruction before line 17, you have to change every reference to line 17 to line 18 – a massive waste of programmer time). In the UniCPU language, this is implemented using labels. You insert a label in your code, and then you can jump to wherever that label is defined. In the below program, the path of execution is lines 1,2,3,4, followed by getting stuck in a loop through lines 2,3,4 for ever, constantly adding 8 to the accumulator.
For example:
load 0
LABEL MYLABEL
add 8
JUMP MYLABEL
add 8
The important question is: how do you write an application capable of doing this – turning one code in one language into code written in another language, and support things like jumps? First you need to select a language to write your compiler language in – independent of the language you are compiling from/to. Because a large part of compilation is string manipulation (i.e. looking at input text, splitting it into tokens, looking strings up in dictionaries etc) we need a language that makes string manipulation simple. Our compiler is quite simple (it’s simple as compilers go anyway) so I selected Python. It’s not the most efficient language, nor is it the language with the most explicit syntax – but it does make splitting Strings easy.
Whilst I’m explaining these steps, it might be helpful to view the code. It’s available here on GitHub. The following description is more an over-arching theory of what happens in the compiler: do bear in mind that I haven’t tried to explain some parts that I’ve deemed ‘out of scope’ (for example converting base-10 to binary, some error handling). See the code for that 🙂
Stage 1 – Preparing and CLEANING input
Of course, the first step is to do your initial setup for your compiler. In the UniCPU compiler, this involves initialising the compilerDictionary with the binary for each instruction, some helper lists for categories of Opcodes and the dictionary which will hold line numbers of code labels. We then call setupAndStart(), where we load in the input code and split it into a list of Strings, where each String is a line of code.
The next preparation step is to remove blank lines, or lines that only contain comments. This is done with a call to removeBlank(lines), as below:
def removeComment(instruction):
''' str -> str '''
return instruction.split("//")[0]
def removeBlank(lines):
numberRemoved = 0
for x in range(0, len(lines)):
if (removeComment(lines[x-numberRemoved]) == ''):
lines.pop(x-numberRemoved)
numberRemoved += 1
Here, we check if a line is empty after removing any comments (which is checked by checking if, after splitting the string by the comment symbol “//”, there is any text in the first block returned from the split call). If there is, there is code on that line and it’s not blank. Otherwise, it is blank or only had comments on it so we delete the line from lines. It’s important to remove blank lines before working on labels/jumping/looping as this is dependent on line numbers within the code, and by removing lines we change the line number of some lines.
STAGE 2 – compiling individual lines
During compilation, we keep a list of 2-item lists – each 2-item list represents a line of code with two operands. This is because, in this compiler, there are two main stages to compilation, both of which need access to individual tokens (a token is just a representation of 1 byte, whether an instruction opcode or an operand/value) – these are compiling (converting the human-friendly opcodes to binary and converting the base-10 numbers to binary) and linking (replacing labels in JUMP commands (used for looping and iterating) with the final line numbers). At the end of the process, that dictionary is all concatenated into one long binary string – the program.
The compilation stage is really quite simple – in Stage 0 above, I talked about setting up a dictionary mapping instructions onto the binary we defined for them in the hardware. In the compileLine(inputLine, lineNumber) function, we first check the line for whether it contains one token or two. Ignoring LABELs for a moment, we then look at each token separately.
The first token is passed to the compileInstruction(instruction, lineNumber) function where we simply look up the token from the compilerDictionary defined at the top of the programme to obtain the binary representation of that opcode.
The second token (if there is one) will be the operand for the opcode that we just compiled. We want to convert the operand (a number) to binary, so we pass it to the function compileValue(value, lineNumber) which returns a String of binary digits representing the base-10 number that was input. In the event that there was no operand (there was only one token, a ‘standalone opcode’, such as ‘INPUT’, ‘SHIFTL’) we set the second token to just be 00000000 (a null byte because the value will not be used).
The compileLine(inputLine, lineNumber) function then returns both the compiled Opcode and Operand in a 2-item list, as described above, to be added to the list of compiledLines:
Compiling JUMPs and LABELs was the most complex part of this compiler. On a high level, it is done in two stages:
Locating and listing all of the LABELs into a dictionary called labels
In a separate pass, finding all of the compiled JUMP instructions and then compiling the operands (the LABELs) using the labels dictionary.
Locating and listing all of the LABELs is done during the initial compileLine step in Stage 2. When a LABEL Opcode is found, a record is added to the ‘labels’ dictionary, using the LABEL name as the key. The value for that key is the line number that will be executed when a jump to that label is called, minus the length of the ‘labels’ dictionary. This subtract is made because lines containing a LABEL will be removed (by not being added to the list of compiled lines ‘compiled’), because they don’t actually cause the CPU to execute anything (they’re just a short-cut for the programmer that the compiler than expands).
After Stage 2 is complete and all lines have been compiled, we then execute point 2 in the above: finding the JUMP instructions and compiling the operands using the labels dictionary we just built up. In the code, this is described as ‘Linking’ (this is in the setupAndStart() function, after the compilation stage):
for lineNumber in range(0, len(compiled)):
if (compiled[lineNumber][0] in jumpOpcodesCompiled):
try:
compiled[lineNumber][1] = compileValue(labels[compiled[lineNumber][1]], None)
except KeyError:
print "ERROR: LABEL: ", compiled[lineNumber][1], " is never defined."
sys.exit(1)
In the above code example, we are checking each line of code in the list of ‘compiled’ lines to see if it contains a JUMP instruction. If it does, we identify the second item in the 2-item list (the operand) and look it up in the ‘labels’ dictionary (the dictionary containing all the LABELs and the corresponding line number they should point to). We then convert convert the base-10 integer to binary and return the compiled line.
Stage 4 – Outputting finished product
Finally – the ‘compiled’ list now contains the whole programme, compiled to binary. We just have to convert each 8-character string of binary characters to a byte, concatenate them together and then write them out as a binary file and it’s ready to go onto the CPU!
The easiest way to do this in Python 2.7 is using the chr(int) function. The chr(int) function takes an integer as a parameter and returns a character according to the ASCII scheme. Since the ASCII system stores characters in 8 bits (the same as our CPU), outputting the characters into a binary ASCII file is all we have to do – that file can then be written to the memory of our CPU, and voila! The CPU will execute our code.
def outputToBinary(compiledCode):
outputData = ''
while (len(compiledCode) > 0):
temp = int(compiledCode[0:8], 2)
outputData += chr(temp)
compiledCode = compiledCode[8:]
f = open(outputFileName, 'wb') #b means binary mode on Windows
f.write(outputData)
return True
In the above code, we first initialise an empty string which will contain our output data (the compiled code). While there are still bytes to be compiled (that the length of compiledCode is still greater than 0), we isolate the next byte, convert it to an Integer, and then convert it to a binary ASCII character using the chr function. We then append that to the outputData, and remove the byte (8-bits) we just processed from compiledCode (so that the next byte can be processed). When all the bytes have been compiled, we open a file in write mode and write the data to that file.
Our compiler also has the ability to output ‘human-readable’ code, i.e. code that is separated by line and includes a space in-between the opcodes and operands. This is controlled by toggling a boolean, and makes debugging much easier.
Example
Let’s take the following sample program:
input // load the accumulator from the input panel
load 0 // overwrite the value in the accumulator to 0
LABEL AddEight // label this piece of code 'AddEight'
add 8
JUMP AddEight
// this block never runs
sub 7
subcy -7
Next we perform Stage 1: after comments, blank lines and labels are removed (with labels, after their line numbers have been recorded), the code looks like:
When compiled to binary ASCII it’s then not human readable or recognisable: you can try opening it in an editor (as I did with the above), which gives:
Compiled Program, opened in Vim
That’s it! A long post, but hopefully it was an interesting read!