Categories
Computer Science CPU Project programming

Writing the Compiler for the UniCPU Project

Our compiler is open-source, you can get the code here on GitHub.

Introduction – project background and groundwork

It’s been a little while since I’ve blogged about the UniCPU team project – which could be perceived as meaning that we’ve dropped the project.  Don’t worry, we haven’t!

Since my last post, we’ve (we = Andrei) continued to make progress on the hardware. Most of the recent work is just wiring up the various units to the multiplexer and wiring up the ROM, all of which is very simple and time consuming. Certainly not worthy of a video.

Whilst Andrei was making progress on the hardware, I got started on writing the compiler (don’t worry, if you don’t know what an compiler is, it’s explained below) for the instruction set we’re supporting. This came at a really timely moment, since I’d just finished studying a module on compilers at University (SYAC, at York), so I had the chance to put the theory into practise – but in a simplified form (since, in my view at-least, there’s no point wasting 10’s of hours writing a super flexible compiler when a simple one will do more than we need).

Before we get into a discussion on the actual compiler, there’s an important question: what is a compiler? Put simply, a compiler is a program that transforms code written in one language to code in another language. It’s typically the thing that transforms code in a high-level language (for example, Java or C) to machine code (a language that can be understood by the CPU in your computer).

STAGE 0 – Background: our language

The UniCPU language is one the simplest languages – it’s simply an instruction set for the CPU with only one feature – code labels (for easy jumping between sections). Code in the UniCPU language takes the form of an instruction/opcode, and then if an operand (parameter) is necessary for the instruction, a space followed by an operand:

[INSTRUCTION] [OPERAND (IF NECESSARY)]

For example, if I wanted to write a program that added the number 5 to a value input on the input board (think of it as the keyboard), I’d write:

INPUT // Load value from the input panel into the accumulator
ADD 5 // Add 5 to the value in the accumulator.

( // Represents comments.  Anything following “//” is ignored by the compiler until the line end).

Truncated Instruction Set Dictionary
Truncated Instruction Set Dictionary

If you remember from our earlier posts, everything within our CPU is represented as single bytes (i.e. an instruction is 8 bits, all operands/values are 8 bits). This means that our instruction memory is always expecting to receive two bytes per instruction – one for opcode and one for the operand – which will be nulled to 00000000 if the opcode doesn’t require a parameter (for example, INPUT). This means we need to have a binary representation for each opcode – which we have. A truncated representation of this is displayed on the right, but you can view the full logic behind this in a spreadsheet located here.

This means that the desired representation of the above program is:

00000000 00000000
00000100 00000101

So, first we put the opcode for INPUT in, then the null operand (since it doesn’t need a parameter). Then we put the opcode in for ADD, followed by the binary representation of the parameter (5 = 0b101).  Giving us the above block of binary.

Another feature of most programming language is the ability to jump from one section of code to another without having to use line numbers. This is because using line numbers is incredibly difficult to maintain – say your application jumps back to line 17 regularly, and then you add an extra instruction before line 17, you have to change every reference to line 17 to line 18 – a massive waste of programmer time). In the UniCPU language, this is implemented using labels. You insert a label in your code, and then you can jump to wherever that label is defined. In the below program, the path of execution is lines 1,2,3,4, followed by getting stuck in a loop through lines 2,3,4 for ever, constantly adding 8 to the accumulator.

For example:

load 0
LABEL MYLABEL
add 8
JUMP MYLABEL
add 8

The important question is: how do you write an application capable of doing this – turning one code in one language into code written in another language, and support things like jumps? First you need to select a language to write your compiler language in – independent of the language you are compiling from/to. Because a large part of compilation is string manipulation (i.e. looking at input text, splitting it into tokens, looking strings up in dictionaries etc) we need a language that makes string manipulation simple. Our compiler is quite simple (it’s simple as compilers go anyway) so I selected Python. It’s not the most efficient language, nor is it the language with the most explicit syntax – but it does make splitting Strings easy.

Whilst I’m explaining these steps, it might be helpful to view the code.  It’s available here on GitHub. The following description is more an over-arching theory of what happens in the compiler: do bear in mind that I haven’t tried to explain some parts that I’ve deemed ‘out of scope’ (for example converting base-10 to binary, some error handling). See the code for that 🙂

Stage 1 – Preparing and CLEANING input

Of course, the first step is to do your initial setup for your compiler.  In the UniCPU compiler, this involves initialising the compilerDictionary with the binary for each instruction, some helper lists for categories of Opcodes and the dictionary which will hold line numbers of code labels. We then call setupAndStart(), where we load in the input code and split it into a list of Strings, where each String is a line of code.

The next preparation step is to remove blank lines, or lines that only contain comments. This is done with a call to removeBlank(lines), as below:

def removeComment(instruction):
	''' str -> str '''
	return instruction.split("//")[0]

def removeBlank(lines):
	numberRemoved = 0
	for x in range(0, len(lines)):
		if (removeComment(lines[x-numberRemoved]) == ''):
			lines.pop(x-numberRemoved)
			numberRemoved += 1

Here, we check if a line is empty after removing any comments (which is checked by checking if, after splitting the string by the comment symbol “//”, there is any text in the first block returned from the split call). If there is, there is code on that line and it’s not blank. Otherwise, it is blank or only had comments on it so we delete the line from lines. It’s important to remove blank lines before working on labels/jumping/looping as this is dependent on line numbers within the code, and by removing lines we change the line number of some lines.

STAGE 2 – compiling individual lines

During compilation, we keep a list of 2-item lists – each 2-item list represents a line of code with two operands. This is because, in this compiler, there are two main stages to compilation, both of which need access to individual tokens (a token is just a representation of 1 byte, whether an instruction opcode or an operand/value) – these are compiling (converting the human-friendly opcodes to binary and converting the base-10 numbers to binary) and linking (replacing labels in JUMP commands (used for looping and iterating) with the final line numbers). At the end of the process, that dictionary is all concatenated into one long binary string – the program.

The compilation stage is really quite simple – in Stage 0 above, I talked about setting up a dictionary mapping instructions onto the binary we defined for them in the hardware. In the compileLine(inputLine, lineNumber) function, we first check the line for whether it contains one token or two. Ignoring LABELs for a moment, we then look at each token separately.

The first token is passed to the compileInstruction(instruction, lineNumber) function where we simply look up the token from the compilerDictionary defined at the top of the programme to obtain the binary representation of that opcode.

def compileInstruction(instruction, lineNumber):
	return compilerDictionary[instruction]

The second token (if there is one) will be the operand for the opcode that we just compiled. We want to convert the operand (a number) to binary, so we pass it to the function compileValue(value, lineNumber) which returns a String of binary digits representing the base-10 number that was input. In the event that there was no operand (there was only one token, a ‘standalone opcode’, such as ‘INPUT’, ‘SHIFTL’) we set the second token to just be 00000000 (a null byte because the value will not be used).

The compileLine(inputLine, lineNumber) function then returns both the compiled Opcode and Operand in a 2-item list, as described above, to be added to the list of compiledLines:

return [compileInstruction(tokens[0], lineNumber), compileValue(int(tokens[1]), lineNumber)]

Stage 3 – compiling jumps and labels

Compiling JUMPs and LABELs was the most complex part of this compiler. On a high level, it is done in two stages:

  1. Locating and listing all of the LABELs into a dictionary called labels
  2. In a separate pass, finding all of the compiled JUMP instructions and then compiling the operands (the LABELs) using the labels dictionary.

Locating and listing all of the LABELs is done during the initial compileLine step in Stage 2. When a LABEL Opcode is found, a record is added to the ‘labels’ dictionary, using the LABEL name as the key. The value for that key is the line number that will be executed when a jump to that label is called, minus the length of the ‘labels’ dictionary. This subtract is made because lines containing a LABEL will be removed (by not being added to the list of compiled lines ‘compiled’), because they don’t actually cause the CPU to execute anything (they’re just a short-cut for the programmer that the compiler than expands).

After Stage 2 is complete and all lines have been compiled, we then execute point 2 in the above: finding the JUMP instructions and compiling the operands using the labels dictionary we just built up. In the code, this is described as ‘Linking’ (this is in the setupAndStart() function, after the compilation stage):

for lineNumber in range(0, len(compiled)):
	if (compiled[lineNumber][0] in jumpOpcodesCompiled):
		try:
			compiled[lineNumber][1] = compileValue(labels[compiled[lineNumber][1]], None)
		except KeyError:
			print "ERROR: LABEL: ", compiled[lineNumber][1], " is never defined."
			sys.exit(1)

In the above code example, we are checking each line of code in the list of ‘compiled’ lines to see if it contains a JUMP instruction. If it does, we identify the second item in the 2-item list (the operand) and look it up in the ‘labels’ dictionary (the dictionary containing all the LABELs and the corresponding line number they should point to). We then convert convert the base-10 integer to binary and return the compiled line.

Stage 4 – Outputting finished product

Finally – the ‘compiled’ list now contains the whole programme, compiled to binary. We just have to convert each 8-character string of binary characters to a byte, concatenate them together and then write them out as a binary file and it’s ready to go onto the CPU!

The easiest way to do this in Python 2.7 is using the chr(int) function. The chr(int) function takes an integer as a parameter and returns a character according to the ASCII scheme. Since the ASCII system stores characters in 8 bits (the same as our CPU), outputting the characters into a binary ASCII file is all we have to do – that file can then be written to the memory of our CPU, and voila! The CPU will execute our code.

def outputToBinary(compiledCode):
	outputData = ''
	while (len(compiledCode) > 0):
		temp = int(compiledCode[0:8], 2)
		outputData += chr(temp)
		compiledCode = compiledCode[8:]
	f = open(outputFileName, 'wb') #b means binary mode on Windows
	f.write(outputData)
	return True

In the above code, we first initialise an empty string which will contain our output data (the compiled code). While there are still bytes to be compiled (that the length of compiledCode is still greater than 0), we isolate the next byte, convert it to an Integer, and then convert it to a binary ASCII character using the chr function. We then append that to the outputData, and remove the byte (8-bits) we just processed from compiledCode (so that the next byte can be processed). When all the bytes have been compiled, we open a file in write mode and write the data to that file.

Our compiler also has the ability to output ‘human-readable’ code, i.e. code that is separated by line and includes a space in-between the opcodes and operands. This is controlled by toggling a boolean, and makes debugging much easier.

Example

Let’s take the following sample program:

input // load the accumulator from the input panel

load 0 // overwrite the value in the accumulator to 0
LABEL AddEight // label this piece of code 'AddEight'
add 8

JUMP AddEight

// this block never runs
sub 7
subcy -7

Next we perform Stage 1: after comments, blank lines and labels are removed (with labels, after their line numbers have been recorded), the code looks like:

input
load 0
LABEL AddEight
add 8
JUMP AddEight
sub 7
subcy -7

After completing Stage 2: (compiling all operands and opcodes except for JUMP LABELs), the code looks like:

00000001 00000000
00000000 00000000
00100100 00001000
00010000 AddEight
00100101 00000111
00100111 11111001

Now we just fill in the LABELs in the compiled code (Stage 3)

Finally we just have to output the finished Binary. In debug mode, it’s easy to see which byte corresponds to which token in the original code:

00000001 00000000
00000000 00000000
00100100 00001000
00010000 00000010
00100101 00000111
00100111 11111001

less so when it’s all concatenated together:

000000010000000000000000000000000010010000001000000100000000001000100101000001110010011111111001

When compiled to binary ASCII it’s then not human readable or recognisable: you can try opening it in an editor (as I did with the above), which gives:

Compiled Program, opened in Vim
Compiled Program, opened in Vim

That’s it! A long post, but hopefully it was an interesting read!

Categories
Computer Science Google Glass programming York University CS Student Blogs

The UK’s First Official Google Glass Hackathon – Ok Glass, Where is the Space Station?

Last weekend (19th-20th July) was the UK’s first official Google Glass Hackathon, and it was a fantastic event – just a few short weeks after Glass became officially available in the UK.  Places were very limited for those without Glass (to ensure there were enough spare pairs floating around), but since I have Glass I was able to go without going through a very competitive lottery.

So, what was talked about, what happened, and what did we build?  Let’s get started.

The Hackathon was organised by Hoi Lam, the Wearables Developer Advocate for Google UK, and a few Googlers came across from the US, including Timothy Jordan, who is a Developer Advocate for Google, focussing on Glass.

Timothy Jordan introduces everyone to developing for Glass #throughglass
Timothy Jordan introduces everyone to developing for Glass #throughglass

After breakfast, the first event was an introduction to the event by Hoi, with some overview of what Glass was (at a non-trivial level) for the Android developers in attendance who hadn’t experienced it before.

Then we moved onto a Design Sprint – just classic stuff really, here’s a use case, design an app that would help this user, that’s Glass specific – not a cell phone app.  Although it was simple, it was a useful exercise in moving some developers away from this idea of a smart phone app with lots of buttons and tons of information.

Finally, it was the main part of the hackathon – the hacking!  People split into groups of between 2 and 8 to produce whatever app they wanted to, with the occasional guidance of a Glass engineer.  I teamed up with two Android developers who were also working in the city, Alessio Fabiani and Antonio Matarrese, and we started brainstorming.

Alessio, Antonio and I - the three creators of 'Ok Glass, Where is the Space Station'
Alessio, Antonio and I – the three creators of ‘Ok Glass, Where is the Space Station’

One idea that came out was a Google Authenticator app for Glass, so that, for users with two-step authentication enabled for their Google Accounts, instead of having to get their phone out when they needed a code, they could just say “Ok Glass, Give me a 2-step code” and Glass would show/read to you the 6-digit code necessary to login to your account.  However, we’ll either leave this until next time or for someone else to make, as we decided to make an application that is especially well suited to the Glass platform – an app to help you find the International Space Station.

There were lots of opportunities for some fun as well, like with this unfortunately labelled bin.
There were lots of opportunities for some fun as well, like with this unfortunately labelled bin.

All a user would have to do is wake up Glass, and say “Ok Glass, Where is the Space Station” – Glass then tells you how long it will be until the Space Station is next visible to you (i.e. it will be visible in the horizon of your current position), using location data from both NASA and the gps trackers in your phone, tethered to Glass.  We then showed four arrows, at the top, left, bottom and right sides of the screen, pointing in the direction of the Space Station, with annotations indicating at what angle in each positive direction you would have to move your head to be looking straight at the Space Station (calculated using the positions of both you and the ISS, and the accelerometer/tilt sensor and compass built into Glass).

This is an example of a really great app for the Glass platform, since because the device is worn on your head it can measure the angle you are currently looking at and direct you as to exactly how to adjust your head position, as opposed to a phone app which can tell you the positions relative to magnetic north and ‘flat’, and then you have to align your head using some other method (e.g. a compass and knowledge of what ‘flat’ is).  Check out the video below of the app working.

Although the code is hackathon level (poor, badly documented, little-if-any code style), and we got a bit confused with the maths for calculating elevation angles, it all seems to work well.  Take a look at the code on GitHub here, try putting it on your Glass and let me know your feedback.  We’re hoping to get it on the MyGlass app store eventually.

Although we didn’t win, we did very well – we had about 18 ‘votes’ (stickers) on our card, compared to the winners who had twenty-something.  We were commended on creating a piece of undeniably useful piece of Glassware, which was a great use-case of the Glass platform.

All in all, a great event – a big Thank You to the Glass Team who organised this – bring on the next Glass Hackathon!

The Glass Hackathon Organising Team
The Glass Hackathon Organising Team

Categories
Computer Science programming University

Building and using OpenCV with Eclipse on Mac OS X

This took me a little while to figure out due to the sometimes subtle differences in Mac OS X and Linux, and the imperfect documentation.  Eventually I got it working though. The goal was to build OpenCV for Mac OS X 10.9 and then get it working in Eclipse. So, here goes:

  1. Ensure you’re using a JDK with version number 1.7 or higher.  If not, see here for instructions.
  2. Download the latest version of OpenCV (I used 2.4.8, so if you have problems, try going back this) from here, for Mac.
  3. Extract it.
  4. cd into the extracted directory of OpenCV in your terminal and execute the following:
mkdir build
cd build
cmake -G "Unix Makefiles" -D CMAKE_CXX_COMPILER=/usr/bin/g++ -D CMAKE_C_COMPILER=/usr/bin/gcc -D WITH_CUDA=ON ..
make -j2

(Note in the above, -j2 refers to the number of CPU cores I had – e.g. set to -j4 if you have 4 logical cores).

Then just look in build/bin, and you’ll see a .jar at the bottom – congratulations, you just built OpenCV! Import this as a library into Eclipse, and set the native path also to build/bin, and you’re done.

If you get stuck, let me know in the comments and I’ll do my best to help.

Categories
Computer Science programming

Installing JRE {7,8} / JDK {1.7,1.8} On Mac for Eclipse

As part of my Software Engineering Project (SEPR) for my course, I’m working on a game with my teammates in Java.  Unfortunately, even with the latest release of OS X (10.9.2), Apple still only provide you with the JDK V1.6, which is quite old and lacks some commonly used features from 1.7 and 1.8.  Although installing it isn’t exactly complicated, it’s not obvious how to link it into your system, and I couldn’t find a manual for doing this with Eclipse.

Now that I’ve found out how to do it, it makes sense to share it to potentially save you some time:

  • Download the latest version (JRE 8, JDK 1.8 at the time of writing) from here.
  • Install it – it should install automatically to a path similar to:
/Library/Java/JavaVirtualMachines/jdk1.8.0_<more numbers>.jdk/Contents/Home
Note, the above scrolls, it ends with '/Home'
  • Open Eclipse, and in the top menu go to Preferences->Java->Installed JRE’s-> and click the add button.
  • Then select “Standard VM” and click next.  Give it the install path from above for the “JRE Home” parameter and click Finish.  Now you can select the new upgraded JRE/JDK for your projects in Eclipse.

I hope that helps.