EpochZero Learn
EpochZero LearnMulti-Domain Tech Learning Hub
All videos
Ep. 1.5foundations

The Foundations of Malware Analysis & x86 Theory

8 May 20264 views

Builds on Episode 1 with rigorous coverage of x86 instruction syntax, register subdivision, EFLAGS, operand types, common instructions, endianness, and the function prologue/epilogue pattern.

Why this lesson exists

Episode 1 introduced the working vocabulary. This episode is the academic deep dive — the version a student preparing for a graduate-level reverse engineering exam should have. We slow down on every concept, prove it with examples, and address the misconceptions that creep into self-taught analysts who skip foundations.

Assembly syntax and the destination-source convention

Assembly language is a human-readable representation of machine code. Each assembly instruction corresponds to one or a few bytes of machine code that the CPU executes directly. A disassembler converts raw bytes back into assembly; the analyst reads the assembly to reconstruct the program's logic.

The x86 Intel syntax (used by IDA Pro and x64dbg) places the destination first:

MNEMONIC    DESTINATION,    SOURCE

For example, MOV EAX, 5 copies the value 5 into EAX. The destination always receives; the source always provides. AT&T syntax (used by GCC and GDB) reverses this order, which is a common source of confusion when switching tools.

Register subdivision

Each 32-bit general-purpose register can be addressed at 16-bit and 8-bit granularity:

  • EAX is the full 32 bits.
  • AX is the lower 16 bits of EAX.
  • AH is the high byte of AX (bits 8–15 of EAX).
  • AL is the low byte of AX (bits 0–7 of EAX).

The same pattern applies to EBX/BX/BH/BL, ECX/CX/CH/CL, EDX/DX/DH/DL. Why this matters: malware sometimes operates at 8-bit granularity — XORing only AL, for instance — to complicate pattern matching by signature engines that look for full-register operations.

The EFLAGS register in practice

EFLAGS is 32 bits wide. Most bits are reserved or rarely useful. Six are essential:

FlagBitSet when
CF (Carry)0Arithmetic carries out of the most-significant bit
PF (Parity)2Result has even parity (low byte)
AF (Aux)4Carry from bit 3 to bit 4 (BCD arithmetic)
ZF (Zero)6Result is zero
SF (Sign)7Result is negative (bit 31 set)
OF (Overflow)11Signed overflow occurred

Conditional jumps read these flags and decide whether to branch. JZ jumps when ZF is 1. JNZ jumps when ZF is 0. JG jumps when ZF=0 and SF=OF (signed greater than). When you see CMP EAX, EBX ; JG label, you are reading: if EAX is greater than EBX as signed integers, jump to label.

Operand types

x86 instructions accept three kinds of operands:

  • Immediate — a constant baked into the instruction. MOV EAX, 5 — the 5 is immediate.
  • Register — a register name. MOV EAX, EBX.
  • Memory — an address, often built from a base register and a displacement. MOV EAX, [EBX+8] reads four bytes starting at the address EBX+8.

Memory operands have a richer syntax: [base + index*scale + displacement]. This is the form a compiler emits for array access: array[i] with a 4-byte element type compiles to [EBX + ECX*4] if EBX holds the array base and ECX holds i.

Common instructions you must read fluently

MnemonicEffect
MOV dst, srcCopy src into dst.
LEA dst, [addr]Load Effective Address. Computes the address but does not dereference.
ADD / SUB / IMUL / IDIVArithmetic.
AND / OR / XOR / NOTBitwise. XOR EAX, EAX is the canonical zero idiom.
CMP a, bComputes a-b but discards the result, setting flags.
TEST a, bBitwise AND, discards result, sets ZF and SF.
JZ / JNZ / JE / JNEConditional jumps based on EFLAGS.
CALL addrPush return address onto stack, then jump.
RETPop return address from stack, jump to it.

LEA deserves special mention — it is often used for arithmetic, not just address loading: LEA EAX, [EBX + EBX*2] computes EBX * 3 in one instruction.

Endianness

x86 is little-endian. The least-significant byte is stored at the lowest address. The 32-bit value 0x12345678 in memory at address 1000 looks like:

AddressByte
100078
100156
100234
100312

When you read a hex dump, you must mentally reverse multi-byte values to see what the CPU sees. Strings in memory are not reversed (each character is one byte) but pointers, integers, and structure fields are.

The function prologue and epilogue

This is the pattern you will see hundreds of times per analysis session. Memorise it.

Prologue:

push  ebp           ; save caller's frame pointer
mov   ebp, esp      ; establish our own frame
sub   esp, N        ; reserve N bytes for local variables

Epilogue:

mov   esp, ebp      ; collapse local variable space
pop   ebp           ; restore caller's frame pointer
ret                 ; return

Or the equivalent shorter form using LEAVE:

leave               ; equivalent to `mov esp, ebp ; pop ebp`
ret

After the prologue, local variables live at negative offsets from EBP and arguments at positive offsets. With this pattern internalised, you can read disassembled C code almost as fluently as the source.

What you should be comfortable with after this lesson

  • Reading any x86 instruction with destination-first syntax
  • Predicting which EFLAGS bits a given instruction sets
  • Recognising the three operand types and computing memory addresses by hand
  • Reading multi-byte values from a hex dump correctly accounting for endianness
  • Identifying function boundaries via prologue and epilogue patterns
Section 03

References

Section 04

Exercises

EX.01medium

Predict EFLAGS for arithmetic

For the sequence MOV EAX, 0x7FFFFFFF ; ADD EAX, 1 — predict the values of CF, ZF, SF, and OF. Verify in x64dbg by stepping through.

EX.02easy

Read a hex dump

Given the bytes 67 45 23 01 at address 0x1000, what 32-bit integer is stored there? Verify your answer by writing a tiny C program and dumping it.

EX.03medium

Identify all functions in a small binary

Take any small Windows EXE under 50 KB. Locate every function prologue. Verify your count matches what Ghidra finds in its function listing.