[System Security 7]x86 Disassembly Crash Course

x86 disassembly quick

x86 architecture

3 types of hardware components:

  • Central Processing Unit: Responsible for executing code
  • Memory (RAM): Responsible for storing all data and code
  • Input/output system (I/O): Provides interfaces for hard drives, keyboards, monitors and other devices

Memory

A program’s memory can be divided into four main sections:

  • Stack: The stack is used for local variables and parameters of functions, and for controlling program execution flow.
  • Heap: The heap is prepared for dynamic memory required during program execution, used to create (allocate) new values, and eliminate (release) values that are no longer needed.
  • Code: The code section contains the instructions obtained by the CPU when executing program tasks.
  • Data: placed here when the program is initially loaded, they do not change when the program is running.

Command format

An instruction consists of a mnemonic and one or more operands.

Mnemonic Destination operand Source operand
mov ecx 0x42

Operation code and byte order

Each instruction uses an opcode to tell the CPU program what operation to perform. Disassembly translates opcodes into human-readable instructions.

Command mov ecx, 0x42
Operation code B9 42 00 00 00

The value 0x42 is represented by 0x42000000 because the x86 architecture uses little-endian byte order. The byte order of data refers to whether the highest bit (big endian) or the lowest bit (little endian) is ranked first in a large data item.

In big-endian byte order, the IP address 127.0.0.1 will be represented as 0x7F000001, and in little-endian byte order (in local memory), it will be represented as 0x0100007F.

operand

The operands describe the data to be used by the instruction. There are three types:

  • Immediate value: a fixed value, such as 0x42
  • Register number: points to the register, such as ecx
  • Memory address: The memory address where it is located, usually consisting of a value, register or equation within square brackets, such as [eax]

Register

A register is a small amount of data memory that can be used by the CPU. There is a set of registers in the x86 processor that can be used for temporary storage or as a work area.

  • general purpose register
  • segment register
  • status flag
  • instruction pointer
General register Segment register Flag register Instruction pointer
EAX [AX, AH, AL] CS EFLAGS EIP
EBX [BX, BH, BL] SS
ECX [CX, CH, CL] DS
EDX [DX, DH, DL] ES
EBP [BP] FS
ESP [SP] GS
ESI [SI]

All general purpose registers are 32 bits in size and can be referenced in assembly code as 32 bits or 16 bits.

There are 4 registers (EAX, EBX, ECX, EDX) that can also be referenced as 8-bit values, thereby using their lowest 8 bits, or the next lowest 8 bits. For example, AL points to the lowest 8 bits of the EAX register, and AH points to its second lowest 8 bits.

32bits 1010 1001 1101 1100 1000 0001 1111 0101
A 9 D C 8 1 F 5
AX
1000 0001 1111 0101
AH AL
1000 0001 1111 0101
8 1 F 5

General purpose register

General-purpose registers are generally used to store data or memory addresses, and EAX usually stores the return value of a function call.

Flag Register

The EFLAGS register is a flag register. During execution, each bit indicates that it is either set (value is 1) or cleared (value is 0), and these values control the CPU’s operations, or give certain CPU operations. The important signs are introduced as follows:

  • ZF ZF is set when the result of an operation is equal to 0, otherwise it is cleared.

  • CF CF is set when the result of an operation is too large or too small relative to the target operand. Otherwise it is cleared.

  • SF When the result of an operation is a negative number, SF is set: if the result is a positive number, SF is cleared. For arithmetic operations, SF is also set when the highest bit value of the operation result is 1.

  • TF TF is used for debugging. When it is set, the x86 processor only executes one instruction at a time.

EIP, instruction pointer

The EIP register, also known as the instruction pointer or program counter, stores the address in memory of the next instruction to be executed by the program. The only role of EIP is to tell the processor what to do next.

Simple command

mov command

Command Description
mov eax, ebx Copy the contents of EBX to the EAX register
mov eax,0x42 Copy the immediate value 0x42 to the EAX register
mov eax,[0x4037C4] Copy 4 bytes of memory address 0x4037C4 to the EAX register
mov eax,[ebx] Copy the 4 bytes at the memory address pointed to by the EBX register to the EAX register
mov eax,[ebx + esi* 4] Copy 4 bytes at the memory address pointed to by the ebx + esi*4 equation result to EAX

lea command

The lea instruction is used to assign a memory address to the destination operand.

lea eax,[ebx + 8] // Give the value of EBX + 8 to EAX.

mov eax,[ebx + 8] //Load the data at address EBX + 8 in memory. 

mov and lea are equivalent instructions, but mov gives the value obtained from memory to eax, and lea gives the memory address to eax.

Arithmetic operations

sub eax,0x10 // subtract 0x10 from the EAX register value
add eax,ebx // Add the EBX value to EAX and save the result to EAX
inc edx // EDX value increments by 1
dec ecx //ECX value decreases by 1
mul value // Multiply eax by value, and the result of the multiplication is stored in EDX and EAX in 64-bit form.
             //EDX stores the high 32 bits, EAX stores the low 32 bits

div value // Similar to mul, but the operation direction is exactly the opposite
shr // shift right
shl // shift left
ror and rol are similar to shift instructions, but the bit shifted out will be filled in the vacant bit at the other end, that is, right rotation shift (ror) will rotate the lowest bit to the highest bit; left rotation (rol) The opposite is true. 

Commonly used logic and shift arithmetic instructions

Command Description
xor eax,eax Clear the EAX register
or eax,0x7575 Or the EAX value with 0x7575
mov eax,0xA Shift the EAX register left by two bits. These two instructions will result in EAX = 0x28
shl eax,2 Because 1010 (the binary representation of 0xA) is left shifted by two places to 101000 (0x28)
mov bl,0xA Circular shift of the BL register Shifting by two bits, these two instructions will result in BL=10000010, because 1010 is rotated 2 bits to the right to 10000010
ror bl,2

NOP instruction

NOP does nothing. When it occurs, the next instruction is executed directly. The nop instruction is actually xchg eax, a pseudo name for eax. But exchanging EAX with itself does nothing. OPCODE is 0x90.

stack

The last-in-first-out structure, the registers used to support the stack include ESP and EBP.

ESP is the stack pointer, which contains the memory address pointing to the top of the stack. When something is pushed or popped off the stack, the value of this register changes accordingly.

EBP is the stack base register and will remain unchanged within a function, so the program uses it as a locator to determine the location of local variables and parameters.

The stack is used for short-term storage, often for local variables, parameters, and return addresses. The main purpose is to manage the exchange of data between function calls.

Function call

1. Call is equal to the push parameter, and push the current instruction address (the contents of the EIP register) onto the stack.

2. Allocate space on the stack for local variables, and the EBP (base pointer) is also pushed onto the stack.

3. Function processing part

4. Adjust ESP to release space for local variables and restore EBP.

5. By calling the ret instruction to return, the instruction will pop the return address from the stack to EIP, and the program will continue execution from the place where it was originally called.

Stack layout

When data is pushed onto the stack, push eax, ESP will decrease by 4.

When data is taken out, pop ebx, ESP will increase by 4

pushad, pusha push all registers onto the stack

popad, popa Pop all registers from the stack

  • pusha pushes a 16-bit register onto the stack
  • pushad pushes a 32-bit register onto the stack

Conditional directive

  • test does not modify the operands it uses, it only sets the flag bits
  • cmp sets the flag
cmp dst,src ZF CF
 dst == src 1 0
 dst < src 0 1
 dst > src 0 0

Branch instruction

  • jmp unconditional jump instruction

  • Some common jump instructions

jz/jnz/je/jg/jge/ja/jae/jl/jle/jl/jle/jb/jbe/jo/js/jecxz

Repeat command

Repeat instructions are a set of instructions for operating data buffers. Common buffer operation instructions:

movsx/cmpsx/stosx/scasx, where x can be b, w or d. Represents bytes, words and double words respectively

These instructions require a prefix and are used to operate on data longer than 1. The movsb instruction itself only moves one byte and does not use the ecx register.

rep // Loop termination condition ECX=0
repe,repz // Loop termination condition ECX=0, ZF = 0
repne,repnz // Loop termination condition ECX=0, ZF = 1

Example:

repe cmsb // EDI and ESI are set to the addresses of the two buffers. ECX must be set to the buffer length. When ECX=0 or the buffers are inconsistent, stop comparing.
rep stosb //Used to initialize bytes in a buffer with a given value. EDI contains the buffer address and AL contains the initial value. Usually used with xor eax, eax
rep movsb // Generally used to copy bytes in the buffer, ESI needs to be set as the source buffer address, EDI must be set as the destination buffer address, and ECX must be the length to be copied. Will be copied byte by byte until ECX=0
rep scasb // Used to search for a byte in a data buffer. EDI needs to point to the buffer address, AL contains the byte you are looking for, and ECX is set to the buffer length. The comparison stops when ECX=0 or when the byte is found