X86 Instructions and ARM Architecture

Read this article, which gives two examples of instructions set architectures (ISAs). Look over how the different microprocessors address memory. Take note of similarities and differences of format, instructions and type of instructions, and addressing modes between these two as well as between these and the MIPS instructions of the previous sections.

Other Instructions

Stack Instructions

push arg

This instruction decrements the stack pointer and stores the data specified as the argument into the location pointed to by the stack pointer.


pop arg

This instruction loads the data stored in the location pointed to by the stack pointer into the argument specified and then increments the stack pointer. For example:

mov eax, 5
mov ebx, 6
push eax
The stack is now: [5]
push ebx
The stack is now: [6] [5]
pop eax
The topmost item (which is 6) is now stored in eax. The stack is now: [5]
pop ebx
ebx is now equal to 5. The stack is now empty.

pushf

This instruction decrements the stack pointer and then loads the location pointed to by the stack pointer with the contents of the flag register.


popf

This instruction loads the flag register with the contents of the memory location pointed to by the stack pointer and then increments the contents of the stack pointer.


pusha

This instruction pushes all the general purpose registers onto the stack in the following order: AX, CX, DX, BX, SP, BP, SI, DI. The value of SP pushed is the value before the instruction is executed. It is useful for saving state before an operation that could potentially change these registers.


popa

This instruction pops all the general purpose registers off the stack in the reverse order of PUSHA. That is, DI, SI, BP, SP, BX, DX, CX, AX. Used to restore state after a call to PUSHA.


pushad

This instruction works similarly to pusha, but pushes the 32-bit general purpose registers onto the stack instead of their 16-bit counterparts.


popad

This instruction works similarly to popa, but pops the 32-bit general purpose registers off of the stack instead of their 16-bit counterparts.


Flags instructions

While the flags register is used to report on results of executed instructions (overflow, carry, etc.), it also contains flags that affect the operation of the processor. These flags are set and cleared with special instructions.


Interrupt Flag

The IF flag tells a processor if it should accept hardware interrupts. It should be kept set under normal execution. In fact, in protected mode, neither of these instructions can be executed by user-level programs.


sti

Sets the interrupt flag. If set, the processor can accept interrupts from peripheral hardware.


cli

Clears the interrupt flag. Hardware interrupts cannot interrupt execution. Programs can still generate interrupts, called software interrupts, and change the flow of execution. Non-maskable interrupts (NMI) cannot be blocked using this instruction.


Direction Flag

The DF flag tells the processor which way to read data when using string instructions. That is, whether to decrement or increment the esi and edi registers after a movs instruction.


std

Sets the direction flag. Registers will decrement, reading backwards.


cld

Clears the direction flag. Registers will increment, reading forwards.


Carry Flag

The CF flag is often modified after arithmetic instructions, but it can be set or cleared manually as well.


stc

Sets the carry flag.


clc

Clears the carry flag.


cmc

Complements (inverts) the carry flag.


Other

sahf

Stores the content of AH register into the lower byte of the flag register.


lahf

Loads the AH register with the contents of the lower byte of the flag register.


I/O Instructions

in src, dest GAS Syntax
in dest, src Intel Syntax


The IN instruction almost always has the operands AX and DX (or EAX and EDX) associated with it. DX (src) frequently holds the port address to read, and AX (dest) receives the data from the port. In Protected Mode operating systems, the IN instruction is frequently locked, and normal users can't use it in their programs.

out src, dest GAS Syntax
out dest, src Intel Syntax


The OUT instruction is very similar to the IN instruction. OUT outputs data from a given register (src) to a given output port (dest). In protected mode, the OUT instruction is frequently locked so normal users can't use it.


System Instructions

These instructions were added with the Pentium II.


sysenter

This instruction causes the processor to enter protected system mode (supervisor mode or "kernel mode").


sysexit

This instruction causes the processor to leave protected system mode, and enter user mode.


Misc Instructions

Read time stamp counter

RDTSC

RDTSC was introduced in the Pentium processor, the instruction reads the number of clock cycles since reset and returns the value in EDX:EAX. This can be used as a way of obtaining a low overhead, high resolution CPU timing. Although with modern CPU microarchitecture(multi-core, hyperthreading) and multi-CPU machines you are not guaranteed synchronized cycle counters between cores and CPUs. Also the CPU frequency may be variable due to power saving or dynamic overclocking. So the instruction may be less reliable than when it was first introduced and should be used with care when being used for performance measurements.

It is possible to use just the lower 32-bits of the result but it should be noted that on a 600 MHz processor the register would overflow every 7.16 seconds:

 {\displaystyle 2^{32}{\text{ cycles}}*{\frac {1{\text{ second}}}{600,000,000{\text{ cycles}}}}\approx 7.16{\text{ seconds}}}{\displaystyle 2^{32}{\text{ cycles}}*{\frac {1{\text{ second}}}{600,000,000{\text{ cycles}}}}\approx 7.16{\text{ seconds}}}

While using the full 64-bits allows for 974.9 years between overflows:

 {\displaystyle 2^{64}{\text{ cycles}}*{\frac {1{\text{ second}}}{600,000,000{\text{ cycles}}*86400{\text{ seconds in a day}}*\ 365{\text{ days in a year}}}}\approx 974.9{\text{ years}}}

The following program (using NASM syntax) is an example of using RDTSC to measure the number of cycles a small block takes to execute:

globalmain
externprintf
section.data
  align4
  a:dd10.0
  b:dd5.0
  c:dd2.0
  fmtStr:db"edx:eax=%lluedx=%deax=%d",0x0A,0
section .bss
  align4
  cycleLow:  resd1
  cycleHigh: resd1
  result:    resd1
section.text
  main:; Using main since we are using gcc to link
;
;opdst, src
;
xoreax,eax
cpuid
rdtsc
mov[cycleLow],eax
mov[cycleHigh],edx
;
; Do some work before measurements 
;
flddword[a]
flddword[c]
fmulpst1
fmulpst1
flddword[b]
flddword[b]
fmulpst1
faddpst1
fsqrt
fstpdword[result]
;
; Done work
;
cpuid
rdtsc
;
; break points so we can examine the values
; before we alter the data in edx:eax and
; before we print out the results.
;
break1:
  subeax,[cycleLow]
  sbbedx,[cycleHigh]
break2:
  push eax
  push edx
  push edx
  push eax
  push dword fmtStr
  call printf
addesp,20; Pop stack 5 times 4 bytes
;
; Call exit(3) syscall
;void exit(int status)
;
mov ebx, 0; Arg one: the status
mov eax, 1; Syscall number:
int0x80

In order to assemble, link and run the program we need to do the following:

$ nasm -felf -g rdtsc.asm -l rdtsc.lst
$ gcc -m32 -o rdtsc rdtsc.o
$ ./rdtsc