The Foundations of Malware Analysis & x86 Theory
Builds on Episode 1 with rigorous coverage of x86 instruction syntax, register subdivision, EFLAGS, operand types, common instructions, endianness, and the function prologue/epilogue pattern.
Why this lesson exists
Episode 1 introduced the working vocabulary. This episode is the academic deep dive — the version a student preparing for a graduate-level reverse engineering exam should have. We slow down on every concept, prove it with examples, and address the misconceptions that creep into self-taught analysts who skip foundations.
Assembly syntax and the destination-source convention
Assembly language is a human-readable representation of machine code. Each assembly instruction corresponds to one or a few bytes of machine code that the CPU executes directly. A disassembler converts raw bytes back into assembly; the analyst reads the assembly to reconstruct the program's logic.
The x86 Intel syntax (used by IDA Pro and x64dbg) places the destination first:
MNEMONIC DESTINATION, SOURCE
For example, MOV EAX, 5 copies the value 5 into EAX. The destination always receives; the source always provides. AT&T syntax (used by GCC and GDB) reverses this order, which is a common source of confusion when switching tools.
Register subdivision
Each 32-bit general-purpose register can be addressed at 16-bit and 8-bit granularity:
EAXis the full 32 bits.AXis the lower 16 bits of EAX.AHis the high byte of AX (bits 8–15 of EAX).ALis the low byte of AX (bits 0–7 of EAX).
The same pattern applies to EBX/BX/BH/BL, ECX/CX/CH/CL, EDX/DX/DH/DL. Why this matters: malware sometimes operates at 8-bit granularity — XORing only AL, for instance — to complicate pattern matching by signature engines that look for full-register operations.
The EFLAGS register in practice
EFLAGS is 32 bits wide. Most bits are reserved or rarely useful. Six are essential:
| Flag | Bit | Set when |
|---|---|---|
CF (Carry) | 0 | Arithmetic carries out of the most-significant bit |
PF (Parity) | 2 | Result has even parity (low byte) |
AF (Aux) | 4 | Carry from bit 3 to bit 4 (BCD arithmetic) |
ZF (Zero) | 6 | Result is zero |
SF (Sign) | 7 | Result is negative (bit 31 set) |
OF (Overflow) | 11 | Signed overflow occurred |
Conditional jumps read these flags and decide whether to branch. JZ jumps when ZF is 1. JNZ jumps when ZF is 0. JG jumps when ZF=0 and SF=OF (signed greater than). When you see CMP EAX, EBX ; JG label, you are reading: if EAX is greater than EBX as signed integers, jump to label.
Operand types
x86 instructions accept three kinds of operands:
- Immediate — a constant baked into the instruction.
MOV EAX, 5— the5is immediate. - Register — a register name.
MOV EAX, EBX. - Memory — an address, often built from a base register and a displacement.
MOV EAX, [EBX+8]reads four bytes starting at the address EBX+8.
Memory operands have a richer syntax: [base + index*scale + displacement]. This is the form a compiler emits for array access: array[i] with a 4-byte element type compiles to [EBX + ECX*4] if EBX holds the array base and ECX holds i.
Common instructions you must read fluently
| Mnemonic | Effect |
|---|---|
MOV dst, src | Copy src into dst. |
LEA dst, [addr] | Load Effective Address. Computes the address but does not dereference. |
ADD / SUB / IMUL / IDIV | Arithmetic. |
AND / OR / XOR / NOT | Bitwise. XOR EAX, EAX is the canonical zero idiom. |
CMP a, b | Computes a-b but discards the result, setting flags. |
TEST a, b | Bitwise AND, discards result, sets ZF and SF. |
JZ / JNZ / JE / JNE | Conditional jumps based on EFLAGS. |
CALL addr | Push return address onto stack, then jump. |
RET | Pop return address from stack, jump to it. |
LEA deserves special mention — it is often used for arithmetic, not just address loading: LEA EAX, [EBX + EBX*2] computes EBX * 3 in one instruction.
Endianness
x86 is little-endian. The least-significant byte is stored at the lowest address. The 32-bit value 0x12345678 in memory at address 1000 looks like:
| Address | Byte |
|---|---|
| 1000 | 78 |
| 1001 | 56 |
| 1002 | 34 |
| 1003 | 12 |
When you read a hex dump, you must mentally reverse multi-byte values to see what the CPU sees. Strings in memory are not reversed (each character is one byte) but pointers, integers, and structure fields are.
The function prologue and epilogue
This is the pattern you will see hundreds of times per analysis session. Memorise it.
Prologue:
push ebp ; save caller's frame pointer
mov ebp, esp ; establish our own frame
sub esp, N ; reserve N bytes for local variables
Epilogue:
mov esp, ebp ; collapse local variable space
pop ebp ; restore caller's frame pointer
ret ; return
Or the equivalent shorter form using LEAVE:
leave ; equivalent to `mov esp, ebp ; pop ebp`
ret
After the prologue, local variables live at negative offsets from EBP and arguments at positive offsets. With this pattern internalised, you can read disassembled C code almost as fluently as the source.
What you should be comfortable with after this lesson
- Reading any x86 instruction with destination-first syntax
- Predicting which EFLAGS bits a given instruction sets
- Recognising the three operand types and computing memory addresses by hand
- Reading multi-byte values from a hex dump correctly accounting for endianness
- Identifying function boundaries via prologue and epilogue patterns
References
Searchable, hyperlinked x86 instruction set reference. Faster than the Intel PDF for daily use.
referenceFree university-level course on x86 assembly with slides and exercises.
referencePaste C code, see the assembly. Excellent for connecting source to disassembly.
toolExercises
Predict EFLAGS for arithmetic
For the sequence MOV EAX, 0x7FFFFFFF ; ADD EAX, 1 — predict the values of CF, ZF, SF, and OF. Verify in x64dbg by stepping through.
Read a hex dump
Given the bytes 67 45 23 01 at address 0x1000, what 32-bit integer is stored there? Verify your answer by writing a tiny C program and dumping it.
Identify all functions in a small binary
Take any small Windows EXE under 50 KB. Locate every function prologue. Verify your count matches what Ghidra finds in its function listing.
