Emulator, Assembler, and Compressor plant

This was my C group project for year 1 of Computing at Imperial College London. We wrote an assembler and emulator from scratch in C for a subset of the armv8 instruction set so that we could execute assembly on a raspberry pi. Each group then chooses any extension to this project, and we chose to build a compressor/decompressor for these assembly files using a custom templating algorithm.

Out of 58 groups, we were a top selected group and won a prize for best presentation.

I'm going to start by going through our extension, followed by a little bit about our emulator and assembler.

Compressing Armv8 Assembly Files

For our extension, we wrote a compressor/decompressor between armv8 assembly files and our own .acs (arm compressed source) files. After trying mutliple compression methods, we landed on a templating approach, creating reusable templates in place of instructions. We did the initial programming as well as all the planning and algorithm design together as a group.

Specifically, I was responsible for:

To determine the templates for a file, we populate an opcode group table hashmap that maps to arrays of instructions that will all belong to a single template category. Categories are not simply the opcode but rather instructions we saw common operands between when doing experiments. For instance, load unconditional (ldrun) and movz instruction with a shift are grouped together. We found being more specific than just opcode led to better results.

testing compressor Image 2

Once the categories have been determined, we perform a frequency analysis on each parameter in the instruction category, determining the most common value and "baking" this into our template. We determined a threshold for the number of times a potential template will be used in a file to determine whether we create it or not in order to balance the added file size from template metadata.

assembler Image 1
assembler Image 2

Templates are then used to replace lines in the main file, with opcode being replaced with a smaller template id and operand values taking an asterisk to indicate the template's default value or other values to override the template operand. We also compress labels to a smaller string while still being lossless. In the acs file below, T7 indicates seven templatesa nd L5 indicates five label name replacements.

assembler Image 1
assembler Image 2

Results and Improvements

We saw some great compression on many files in our test suite, with a peak compression of 48%. We found that even files with fewer repeating operands had great compression from the label and template group id substitutions alone. However to improve compression further, we would look at templating whole blocks of repeated structures. We would also compress whole assembly projects by determining templates on a subset of project files and using these for all files (like "training data").

assembler Image 1

Testing and Coding Practice

A key strength of our extension was a rigorous test suite I wrote which helped catch niche bugs and greatly sped up development time. The test suite featured a wide variety of assembly files (39 in total) which would be compressed and decompressed with their outputs checked for equality. We implemented multiple different error types for failure and on success, output the compression percentage. Throughout our project, we also had structured debug prints which could all be turned on/off from a single #define which we found useful when debugging a single test case.

testing compressor Image 2

testing compressor Image 2

We maintained great coding practice throughout the project, with readable and commented code, as well as clear interfaces between tasks to facilitate parallel development. Each function in our codebase is doxygen commented too, which gave us great documentation to share with our supervisor at the end of the project. We also had group code review sessions for merging and checkins.

testing compressor Image 2

The Emulator

The first project part was building an emulator. Our overall interrface pipeline is shown below along with the part of our internal representation we translated binary instruction into. Specifically, I was responsible for:

emulator Image 1
emulator Image 2

The Assembler

We then made our assembler, which translated armv8 assembly files into binary files. Specifically, I was responsible for:

assembler Image 1
assembler Image 2