User Tools

Site Tools


2014:arm_assembly_adventures

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
2014:arm_assembly_adventures [2014/12/17 12:09]
dcelik
2014:arm_assembly_adventures [2014/12/17 14:32]
dcelik
Line 2: Line 2:
 Dennis Chen and Michael Bocamazo Dennis Chen and Michael Bocamazo
  
-  +**What did you do? 
-**What did you do?** +** 
- +We investigated optimization of the accuracy of fixed point math functions on the ARM CPU family.
-We "investigated" ​optimization of the accuracy of fixed point math functions on the ARM CPU family.+
  
 Computer representations of numbers are never entirely accurate, they just get “close enough”, since we only have a finite amount of space, or bits, to represent numbers in. Similarly, mathematical functions that we often call in programs without questioning,​ like cos(x) or x/y, really use algorithms to return an answer that is close enough to the value we're looking for. In our project, we implemented some of these mathematical functions ourselves to understand them better. Ideally we would have then optimized the accuracy of these mathematical functions by tweaking constants that are involved in the algorithms, but for this project, we reached for the proverbial moon, our arms weren’t long enough, shot ourselves in the face, and fell down a very tall staircase headfirst. One could say we overscoped. Computer representations of numbers are never entirely accurate, they just get “close enough”, since we only have a finite amount of space, or bits, to represent numbers in. Similarly, mathematical functions that we often call in programs without questioning,​ like cos(x) or x/y, really use algorithms to return an answer that is close enough to the value we're looking for. In our project, we implemented some of these mathematical functions ourselves to understand them better. Ideally we would have then optimized the accuracy of these mathematical functions by tweaking constants that are involved in the algorithms, but for this project, we reached for the proverbial moon, our arms weren’t long enough, shot ourselves in the face, and fell down a very tall staircase headfirst. One could say we overscoped.
Line 23: Line 22:
 The impetus for doing numerical optimization and function approximation was Eric’s statement that genetic algorithms worked well enough as a non-gradient based optimizer for many function approximations that relied on look up table values, or parameterizations of truncated Taylor series. ​ We wanted to see if we could demonstrate the feasibility of using other optimization approaches in the context of function approximation. ​ For this reason, we wanted to recreate Eric’s code to get fidelity on the execution and comparison. ​ Because the original source was lost to the sands of time, we used a disassembler to recreate the source from the compiled version (this meant we lost any comments available). ​ The source was in the unfamiliar ARM (Little Endian, v7) architecture,​ which meant we had to familiarize ourselves with an instruction set with additional features (Persistent condition flags! Conditional Execution! Variable operand length!). ​ This was an educational task in itself, and in doing so, we produced something of (questionable) educational value for students looking to understand more about different assembly systems, or hacky ways to do integer math with fidelity in python.  ​ The impetus for doing numerical optimization and function approximation was Eric’s statement that genetic algorithms worked well enough as a non-gradient based optimizer for many function approximations that relied on look up table values, or parameterizations of truncated Taylor series. ​ We wanted to see if we could demonstrate the feasibility of using other optimization approaches in the context of function approximation. ​ For this reason, we wanted to recreate Eric’s code to get fidelity on the execution and comparison. ​ Because the original source was lost to the sands of time, we used a disassembler to recreate the source from the compiled version (this meant we lost any comments available). ​ The source was in the unfamiliar ARM (Little Endian, v7) architecture,​ which meant we had to familiarize ourselves with an instruction set with additional features (Persistent condition flags! Conditional Execution! Variable operand length!). ​ This was an educational task in itself, and in doing so, we produced something of (questionable) educational value for students looking to understand more about different assembly systems, or hacky ways to do integer math with fidelity in python.  ​
  
-**How did you do it?**+How did you do it?
 Our github repo: https://​github.com/​dennis-chen/​arm-assembly-simulator Our github repo: https://​github.com/​dennis-chen/​arm-assembly-simulator
  
Line 41: Line 40:
  
 To build on this code, someone could extend the functionality of our translator/​emulator/​simulator,​ or make contributions to the introduction to ARM.  Because our work became the foundation we needed for what we wanted to do with optimization,​ tackling the original optimization problem would be a good way to build on what we have done so far. To build on this code, someone could extend the functionality of our translator/​emulator/​simulator,​ or make contributions to the introduction to ARM.  Because our work became the foundation we needed for what we wanted to do with optimization,​ tackling the original optimization problem would be a good way to build on what we have done so far.
 +
 +**A Short Introduction to ARM **
 +From the perspective of those familiar with MIPS
 +
 +Conditional Execution
 +The biggest difference that is immediately apparent is the conditional execution flags and blocks. ​ The ARM architecture retains the values of four flags that allow for a variety of comparisons to be made, and have much more compression than other assembly languages that require many conditional branching instructions. ​ The “IT” or If-Then blocks are much more intuitive to programmers of high level languages, because not every condition requires a branch, which seems like it would accumulate to progressively messier systems of GOTO statements. ​ Instead, an the conditional execution instruction starts the block as “IT”, “ITT” or “ITTT”, depending on the number of following instructions in the block. ​
 +
 +There are four base conditional flags: Z = Zero, C = Carryout, V = Overflow, and N = Negative. ​ Many instructions are able to set these base flags directly, such as “TEQ” (test if equal) or “CMP” (compare), but in many instructions,​ an “S” flag may be appended to set the flags if necessary. ​ From the 4 base flags, there are 14 flags that are constructed that the IT instruction can call upon.  The include all comparisons:​ >, <, >=, <=, with and without signed representation. ​ Some of them, such as CC (Carry Clear: C == 0) and MI (Minus, negative, N == 1) encode the same arithmetic meaning but are used in different situations based on the instruction that set them.  HI, unsigned higher, means greater than or unordered, C == 1 and Z == 1.  LT: Signed Less than (Less than, or unordered) checks if N != V.  So the flags are somewhat hidden from the programmer, but the conditional branching is just as flexible as the many possible branching conditions seen in MIPS.  The setting instruction could be “CMP R0, R1” then the leading instruction of the block “ITTT HI”, with “MOVHI R2, R0”, “MOVHI R0, R1”, “MOVHI R1, R2” following. ​ This translates to: If R0 > R1, swap the values of R0 and R1, which is done through the intermediate of R2.  It is possible that the ITTT instruction increments the program counter by the appropriate number of instructions past the conditional block if the condition is not met, in which case it would save a few instructions on each miss.  Conditional branches, such as BEQ, do not require a preceding IT instruction,​ and simply examine the condition flag.  Often, the IT blocks and conditional flags can be nicely simplified away in a Python translation by creating an If block. ​ Sometimes, however, the flag to be evaluated is dependent on a value that changes before the conditional execution, in which case it is necessary to reorder the instructions,​ or store the value temporarily as a placeholder. ​
 +
 +Word Length
 +You may notice funny .W or .N after each instruction. ​ W means “Wide” and N means “Narrow”. ​ These hard code the word lengths at 32 and 16 bits respectively.
 +
 +Exchange of Instruction Set
 +“An ARM for a THUMB makes the whole world’s computers execute faster(?​)”. ​ You may notice BX, BXEQ, or similar. ​ Branch-Exchange means the the instruction sets used switch on this instruction between ARM and THUMB. ​ From cursory research, THUMB is a more compressed instruction set that can execute faster if done correctly (probably involving a lot of hand-optimization of algorithms) but if done incorrectly can be slower because sometimes multiple instructions are required to replicate what can be done in ARM.  We don’t have enough knowledge to comment on the differences beyond that.
 +
 +Shifting and Rotation
 +Instructions to shift (logical or arithmetic) can be placed in the arguments of a instruction as in “ADD.W R3, R3, R2, LSL#2” which means R3 = R3 + R2<<​2. ​ In addition logical or arithmetic shifting left and right, there is also Rotation, which allows variable bits to be shifted in.  RRX - Rotate Right with Extend - shifts in the value of the carry flag to the MSB and shifts right.
 +
 +Interesting Arithmetic
 +There are a few interesting arithmetic operators not in MIPS.  Some of these are RSB: Reverse subtract, which amounts to negating a variable and adding it to an immediate. “RSB.W R0, R0, #​0x20000000” is R0 = #​0x20000000-R0. ​ There is a very useful operator for division, MLS, the multiply-subtract. ​ Of the form: MLS Rd, Rn, Rm, Ra; operates: ​ Rd = Ra - Rn*Rm. ​ When doing fractional division, this allows you to find the expected remainder given the current estimate of the quotient, as Remainder = Dividend - Quotient*Divisor. ​ This is the key instruction for understanding the division algorithm (that we’ve translated, at least. ​ There are probably others).
 +
 +Conclusion:
 +Using the structure of conditional execution, one can readily convert pseudocode into a structure that is suitable for coding into ARM.  The If-Then blocks translate directly, and if condition execution requires more than three instructions,​ or has multiple levels of flow control, a branch is probably appropriate. ​ However, this allows most simple programs to reserve branching for actual function calls rather than flow-control blocks. ​ There are many instructions not included in MIPS, such as shifting and arithmetic functions, that allow for better hand-optimization of programs, and can even make code more readable. ​ (It looks like) ARM is actually used in industry, while MIPS is mostly educational,​ so familiarity with ARM could be valuable and enriching.
  
2014/arm_assembly_adventures.txt · Last modified: 2014/12/17 15:01 by dcelik