Introduction to Malware Analysis: Triage & x86 Architecture
Why malware analysis is a discipline, the three-question triage every analyst runs, and a clean introduction to x86 registers, the stack, and the difference between static and dynamic analysis.
What this lesson covers
Malware analysis exists because every day, hundreds of thousands of new samples reach corporate networks, government agencies, and personal machines. Anti-virus signatures buy time, but they do not answer the question that matters most when an incident occurs: what is this file doing on my system, and what has it already done? Answering that requires reading the binary itself.
This lesson lays the groundwork for every video that follows. We start with what malware is and how analysts categorise it without getting lost in taxonomy. We then introduce the three-question triage that separates urgent threats from routine samples. The second half is purely technical: x86 architecture, the registers an analyst must know, the stack, and why static and dynamic analysis are complementary rather than alternative.
Defining malware honestly
The word malware is short for malicious software. The definition is deliberately broad — a twenty-line batch script that wipes a directory qualifies, and so does a state-sponsored implant that survives firmware reflashes and hard disk replacements. What makes malware analysis worth studying is not the definition itself but the variety of what an analyst encounters in practice.
Every sample tells a story: who built it, what they wanted, how they tried to conceal it, and where they made mistakes. The analyst's job is to read that story from the binary.
A practical taxonomy, not a textbook one
Malware is classified by what it does and how it spreads. In practice, these categories overlap constantly.
- WannaCry (2017) was simultaneously ransomware and a worm.
- Emotet started life as a banking trojan, evolved into a spam botnet, and eventually operated as a dropper-for-hire platform.
- TrickBot functioned as a downloader, a credential stealer, and a lateral movement framework depending on which modules the operator loaded at any given time.
The value of taxonomy is triage speed. When an analyst sees vssadmin delete shadows /all in a Procmon log, the word ransomware triggers an immediate containment playbook. When the same analyst sees outbound SMTP carrying Base64-encoded attachments, the word keylogger narrows the investigation to credential theft. Classification is not about putting samples into neat boxes; it is about knowing which box tells you what to do next.
The three-question triage
Every sample, before any deep analysis begins, gets these three questions answered:
- What is this file? File type, hash, code-signing status, compile timestamp, suspected family.
- Is it suspicious? Public threat intel hits, structural anomalies (high entropy, truncated imports, non-standard sections), behavioural red flags during a brief sandbox detonation.
- What can we learn before running it? Strings, embedded resources, imported APIs, persistence indicators visible in the binary.
A good analyst can complete this triage in under thirty minutes for a typical sample. The goal is not to solve the case — it is to decide whether the sample warrants the next four hours of deep work.
Static vs dynamic analysis
These are the two foundational approaches, and they are not alternatives. They are complementary.
Static analysis examines the binary without running it. We compute its hash, parse the PE structure, list its imports, extract its strings, and disassemble its code. The advantage is safety: nothing executes. The limitation is that packed and obfuscated samples reveal nothing useful until they unpack themselves at runtime.
Dynamic analysis runs the sample inside an isolated VM and observes what happens. We monitor file-system changes with Procmon, capture network traffic with Wireshark or FakeNet, watch process activity with Process Hacker, and dump memory at strategic points. The advantage is direct observation of behaviour. The limitation is that defended malware can detect the lab environment and refuse to execute.
A complete analysis uses both. Static analysis tells us where to look during dynamic analysis. Dynamic analysis tells us what the static code actually does at runtime when packers and obfuscation peel away.
x86 — only what an analyst needs
A complete x86 reference would be a textbook of its own. For malware analysis, we need a working knowledge of registers, the stack, and a handful of common instructions. Everything else can be learned in context.
General-purpose registers
Registers are storage locations inside the CPU. Accessing a register takes one clock cycle; accessing RAM takes hundreds. The register state at any point tells the analyst what the program is doing right now.
| Register | Convention | Why it matters in malware |
|---|---|---|
EAX | Return values | After CALL, the return value sits here. IsDebuggerPresent returns 1 or 0 in EAX. |
EBX | General storage | Callee-saved. Holds values that persist across function calls. |
ECX | Loop counter | LOOP and REP auto-decrement ECX. XOR decryption loops store buffer length here. |
EDX | I/O, overflow | Combined with EAX for 64-bit MUL/DIV results. |
ESP | Stack top pointer | Push decrements, pop increments. Always points to the last item pushed. |
EBP | Frame anchor | Local variables at [EBP-N], arguments at [EBP+N]. |
ESI | Source index | Source for REP MOVSB memory copies. |
EDI | Destination index | Destination for memory copies. |
Two special registers deserve their own line:
- EIP (Instruction Pointer) — the address of the next instruction. Cannot be set with
MOV. Changes only viaJMP,CALL,RET, and conditional jumps. Controlling EIP is the goal of most exploitation techniques. - EFLAGS — a 32-bit register where individual bits are flags set by arithmetic and comparison operations. Conditional jumps read these flags. Malware uses them for runtime decisions, including anti-debugging checks.
The stack and the function frame
The stack grows downward in memory. A typical function does this on entry:
push ebp ; save caller's frame pointer
mov ebp, esp ; establish new frame
sub esp, 0x20 ; reserve 32 bytes of local space
After this prologue, local variables are accessed via [EBP-N] (further from EBP = newer locals) and arguments via [EBP+N] (8 = first arg, 12 = second arg, etc., on x86 32-bit cdecl). Recognising prologues and epilogues is a fundamental skill — they delineate function boundaries in stripped binaries.
What you should be comfortable with after this lesson
- Defining malware, and recognising why categories blur in real samples
- Running the three-question triage on a sample you have never seen before
- Naming each general-purpose x86 register and stating one common use
- Identifying a function prologue in disassembled code
The next lesson takes the same foundations deeper, with worked examples from real samples.
References
Submit a hash or file to see results from 70+ AV engines, plus community comments and behavioural reports.
toolFree, hash-indexed sample repository for triage. Excellent for getting context on a known sample.
toolAuthoritative x86/x64 reference. Volumes 2A-2D are the instruction reference.
referenceApproachable supplement to the Intel manuals.
wikiExercises
Hash a known-good binary
Compute the MD5, SHA-1, and SHA-256 hashes of notepad.exe from a clean Windows VM. Submit the SHA-256 to VirusTotal. Note how many engines flag it (zero, in normal circumstances).
Run the three-question triage
Pick any sample from MalwareBazaar (filter by tag — try njRAT). Without running it, answer: what is this file, is it suspicious, and what can you learn before running it?
Identify a function prologue
Open any 32-bit Windows executable in Ghidra. Locate the entry point and identify the function prologue (push ebp ; mov ebp, esp). Then find any function it calls and identify its prologue.
