This is an old revision of the document!
“ARM Assembly Adventures” Dennis Chen and Michael Bocamazo
What did you do?
We investigated optimization of the accuracy of fixed point math functions on the ARM CPU family.
Computer representations of numbers are never entirely accurate, they just get “close enough”, since we only have a finite amount of space, or bits, to represent numbers in. Similarly, mathematical functions that we often call in programs without questioning, like cos(x) or x/y, really use algorithms to return an answer that is close enough to the value we're looking for. In our project, we implemented some of these mathematical functions ourselves to understand them better. Ideally we would have then optimized the accuracy of these mathematical functions by tweaking constants that are involved in the algorithms, but for this project, we reached for the proverbial moon, our arms weren’t long enough, shot ourselves in the face, and fell down a very tall staircase headfirst. One could say we overscoped.
More specifically, for this project we chose a specific numerical representation (fixed point math), and a specific architecture (ARM) to work with. This limited us to using certain algorithms, and to writing our code using a instruction set that is specific to ARM CPUs. In the absence of tools to run and evaluate example ARM assembly programs, we implemented several basic ARM instructions in python so we could accurately translate between ARM and python and run python code that emulated an ARM CPU. ARM assembly instructions like add,subtract,multiply,udiv,lshift,etc… were implemented in python. Afterwards, we used these instructions as building blocks to write an IQ30 fixed point division function and evaluated it's accuracy against python's own native floating point division.
After successfully doing the above, we then attempted to implement an IQ29 arctan function in our frankenstein python/ARM, but everything came to a screeching halt when we realized that the actual ARM assembly code we were using as a reference used looked up tables and stored values that we could not find. Without these values we couldn't know if our arctan function was even implemented correctly, so the original ambitious goals of machine learning were scrapped, probably much to Eric's amusement.
Alternatively, we could have bootstrapped with optimization from just randomized initial guesses in the look up table, and used some of MATLAB’s non-gradient based optimizers. However, we were unsure of the correctness of our implementation (given that it was impossible to run without the table) and in the final hours focused on making the work we have done as meaningful as possible as an introduction to ARM.
What we hoped to replicate: http://upload.wikimedia.org/wikipedia/commons/f/f7/Atan2.png
Why did you do it?
The impetus for doing numerical optimization and function approximation was Eric’s statement that genetic algorithms worked well enough as a non-gradient based optimizer for many function approximations that relied on look up table values, or parameterizations of truncated Taylor series. We wanted to see if we could demonstrate the feasibility of using other optimization approaches in the context of function approximation. For this reason, we wanted to recreate Eric’s code to get fidelity on the execution and comparison. Because the original source was lost to the sands of time, we used a disassembler to recreate the source from the compiled version (this meant we lost any comments available). The source was in the unfamiliar ARM (Little Endian, v7) architecture, which meant we had to familiarize ourselves with an instruction set with additional features (Persistent condition flags! Conditional Execution! Variable operand length!). This was an educational task in itself, and in doing so, we produced something of (questionable) educational value for students looking to understand more about different assembly systems, or hacky ways to do integer math with fidelity in python.
How did you do it? Our github repo: https://github.com/dennis-chen/arm-assembly-simulator
First, we obtained Eric’s original ARM Stellaris IQ Math Standalone package from Texas Instruments: http://www.ti.com/tool/sw-iqmath&DCMP=STELLARIS&
Then, we installed a disassembler to recreate the source from this version. We installed the demo version of the hexrays disassembler. https://www.hex-rays.com/products/ida/support/download_demo.shtml (To enable auto comments, go to options→ general → auto comments. Disregard the strange icon.) However, this does not seem to include the table of constants used for function approximation, or in our foolishness we did not see them, so this approach is currently not complete.
The first function that we wanted to implement for introduction was the signed fractional division function. Integer division (UDIV) is already implemented as an instruction in ARM, but that cannot be used for fractional division. Instead, an algorithm that shifts the dividend to significance and iteratively evaluates the significance of the remainder of the dividend is used to achieve a “screaming nasty fast” implementation of fractional division. We translated the ARM into pseudo code, using the ARM documentation (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0041c/Babbfdih.html#), the ARM wiki (http://www.heyrick.co.uk/armwiki/Main_Page), and some notes pages that were marginally helpful for the main ideas of division, but were not identical to this implementation (http://opencourseware.kfupm.edu.sa/colleges/ccse/coe/coe308/files%5C2-Lecture_Notes_05-MultiplyDivide.pdf, http://www.ann.ece.ufl.edu/courses/eel4713_12spr/slides/Lec8-division.pdf). From this pseudo code, we wrote a python implementation using fixed-point arithmetic. We then tested its accuracy against Python’s floating point division, and it was within reasonable accuracy (below 1e-8 of abs((actual-expected)/expected), so our implementation was confirmed in its correctness.
Then, we did the same procedure for the atan2 function. atan2 differs from atan of one argument in that it is able to return a value between -pi and pi instead of just -pi/2 and pi/2 because it evaluates both arguments, rather than just their ratio. atan2 would be a suitable function for attempting optimization as an introduction to it. We then translated the ARM code to our set of python functions as we had before, but we soon realized that we were missing a piece - the lookup tables originally used. Then, we decided to transition to making our code do something like emulation of a program, or translation of ARM assembly into python-executable functions that can form into a program. This would become what is essentially a really bad approximation of what MARS is for ARM. Some such program probably already exists, however, it was useful to create it.
Because the theme of this project is meant to be communication, and an introduction ARM has some educational value, we chose to write a short introduction to the ARM assembly instruction set for those familiar with MIPS, which could become a useful reference for the course, or a time savings for similar projects in the future.
Our program translator is contained in ‘simulator.py’, and you can simply run python simulator.py my_arm_file to see the outputs of the program. We had to stop working on it given the time crunch, so all it currently does is parse most of the assembly into a series of python functions that will be called, but haven’t been implemented yet. The jump is additionally not yet working, but we hope to extend this into a program simulator (with a basic and very limited instruction set) that actually works for expo because it seems feasible.
How can someone else build on it?
To build on this code, someone could extend the functionality of our translator/emulator/simulator, or make contributions to the introduction to ARM. Because our work became the foundation we needed for what we wanted to do with optimization, tackling the original optimization problem would be a good way to build on what we have done so far.