Roadmap

Malware Analyst / Reverse Engineer

The specialist who dissects malicious software to understand how it works, what it does, and how to detect and defend against it. Uses static analysis (reading disassembled code), dynamic analysis (running malware in controlled environments), and reverse engineering (reconstructing intent from binary instructions).

OPTIMISTIC 3–4 yearsREALISTIC 4–5 years

Stage 00

Computer Architecture & Assembly Language

Malware analysis requires reading compiled machine code as if it were source code. This is impossible without understanding CPU architecture, registers, memory, and assembly instructions.

CPU Architecture Fundamentals

Von Neumann architecture — fetch → decode → execute cycle
CPU components — ALU (Arithmetic Logic Unit), registers, control unit, cache hierarchy (L1/L2/L3)
Memory hierarchy — registers (fastest) → cache → RAM → storage (slowest)
Instruction pipelining — executing multiple instructions simultaneously; hazards
Endianness — little-endian (x86/x64, ARM mostly) vs big-endian (network byte order, some MIPS) - Little-endian: least significant byte at lowest address - Example: 0x12345678 stored as 78 56 34 12 in memory - Network traffic uses big-endian (network byte order)

Number Systems for Assembly

Binary — base-2; bit operations: AND, OR, XOR, NOT, SHL, SHR - XOR a, a = 0 (zeroing a register without encoding a null byte) - AND for masking bits; OR for setting bits; NOT for bit inversion
Hexadecimal — base-16; two hex digits per byte; memory addresses in hex - 0x41 = 'A' in ASCII; 0x90 = NOP (no-operation) instruction
Two's complement — signed integer representation - Negative numbers: invert all bits, add 1 - -1 in 32-bit = 0xFFFFFFFF; -128 in 8-bit = 0x80

x86 and x64 Registers — Memorize These

General purpose (x86 32-bit): - EAX — Accumulator; return values from functions - EBX — Base register; often used for addressing - ECX — Counter; loop counter, function argument on some calling conventions - EDX — Data; I/O port addressing, high-order 32 bits of multiplication results - ESI — Source Index; source pointer in string operations - EDI — Destination Index; destination pointer in string operations - ESP — Stack Pointer; points to current top of stack - EBP — Base Pointer; points to current stack frame base
x64 extensions (64-bit): - RAX, RBX, RCX, RDX, RSI, RDI, RSP, RBP — 64-bit versions - R8–R15 — additional 64-bit registers - 32-bit sub-registers: EAX, EBX, ECX, EDX, etc. (zero-extend upper 32 bits when written) - 16-bit sub-registers: AX, BX, CX, DX, SI, DI, SP, BP - 8-bit sub-registers: AL/AH (low/high byte of AX), BL/BH, CL/CH, DL/DH
Instruction Pointer: - EIP (x86) / RIP (x64) — points to next instruction to execute; key exploit target
Flags register (EFLAGS/RFLAGS): - CF — Carry Flag; arithmetic carry/borrow - ZF — Zero Flag; result was zero; used by conditional jumps - SF — Sign Flag; result was negative - OF — Overflow Flag; signed arithmetic overflow - PF — Parity Flag; even number of 1 bits in result
Segment registers — CS (code), DS (data), SS (stack), ES/FS/GS (extra segments) - FS segment — Windows: points to Thread Information Block (TIB); critical for exploitation and anti-debugging - GS segment — Linux 64-bit: thread-local storage
Control registers — CR0 (protected mode enable), CR3 (page directory base), CR4

Memory Model and Stack

Memory segments in a process: - Text/Code segment — executable instructions; read-only; shared - Data segment — initialized global and static variables - BSS — uninitialized global and static variables; zeroed at startup - Heap — dynamically allocated memory; grows up; malloc/new - Stack — local variables, function call information; grows down; push/pop - Memory-mapped region — DLLs, mapped files
Stack frame mechanics, critical for understanding function calls: - CALL instruction — pushes return address (EIP/RIP value of next instruction) onto stack; jumps to function - Function prologue — PUSH EBP; MOV EBP, ESP; SUB ESP, [local var space] - Local variables — at negative offsets from EBP (EBP-4, EBP-8, etc.) - Parameters (32-bit cdecl) — at positive offsets from EBP (EBP+8, EBP+12, etc.) - Function epilogue — MOV ESP, EBP; POP EBP; RET - RET instruction — pops return address from stack; jumps to it
Calling conventions, defining how parameters are passed and who cleans up the stack: - cdecl (C default) — parameters pushed right-to-left; caller cleans stack - stdcall (WinAPI) — parameters pushed right-to-left; callee cleans stack - fastcall — first two params in ECX, EDX; rest on stack - x64 Windows — first four params in RCX, RDX, R8, R9; rest on stack; shadow space - x64 Linux (System V AMD64 ABI) — first six params in RDI, RSI, RDX, RCX, R8, R9

Core x86 Assembly Instructions

Data movement: - MOV dest, src — copy data (registers, memory, immediate) - PUSH val — decrements ESP, writes val to [ESP] - POP dest — reads [ESP] to dest, increments ESP - LEA dest, [addr] — load effective address; often used for address calculations - XCHG a, b — exchange values - MOVZX / MOVSX — zero-extend / sign-extend smaller to larger register
Arithmetic: - ADD dest, src — addition; sets flags - SUB dest, src — subtraction; sets flags - INC dest — increment by 1 - DEC dest — decrement by 1 - MUL / IMUL — unsigned / signed multiply; result in EDX:EAX (32-bit) - DIV / IDIV — unsigned / signed divide; quotient in EAX, remainder in EDX - NEG dest — negate (two's complement)
Bitwise: - AND, OR, XOR, NOT — bitwise operations - SHL / SHR — shift left / right logical - SAR — shift arithmetic right (preserves sign bit) - ROL / ROR — rotate left / right - TEST dest, src — AND without modifying dest; sets flags (used before JZ/JNZ) - CMP dest, src — SUB without modifying dest; sets flags (used before conditional jumps)
Control flow: - JMP label — unconditional jump - JE / JZ — jump if equal / zero flag set - JNE / JNZ — jump if not equal / zero flag clear - JG / JGE / JA / JAE — jump if greater/greater-equal (signed/unsigned) - JL / JLE / JB / JBE — jump if less/less-equal (signed/unsigned) - JC / JNC — jump if carry / no carry - CALL addr — push return address, jump to function - RET [n] — return from function; optional stack cleanup - LOOP — decrements ECX; jumps if ECX not zero
String operations: - MOVS — move string (MOVSB byte, MOVSW word, MOVSD dword) - CMPS — compare strings - SCAS — scan string for value in AL/AX/EAX - STOS — store value to string (fill memory) - REP prefix — repeat string operation ECX times

Resources

"x86 Assembly Language and C Fundamentals" by Joseph Cavanagh (book)
x86 instruction reference (felixcloutier.com/x86/, free)
"Computer Organization and Design" (Patterson & Hennessy, book)
nasm.us documentation (free)
godbolt.org (compile C code to assembly online, free)

Stage 01

Executable File Formats

Malware is packed into PE files on Windows and ELF files on Linux. Understanding the container format is essential to understanding what the malware is.

PE (Portable Executable) Format — Windows

Overview — container format for .exe, .dll, .sys, .drv, .ocx files
DOS Header (IMAGE_DOS_HEADER): - e_magic — "MZ" (0x4D5A) magic bytes; Mark Zbikowski's initials - e_lfanew — offset to PE header
PE Header (IMAGE_NT_HEADERS): - Signature — "PE\0\0" (0x50450000) - COFF File Header (IMAGE_FILE_HEADER): Machine type, NumberOfSections, TimeDateStamp, Characteristics (DLL flag, executable flag) - Optional Header (IMAGE_OPTIONAL_HEADER): - Magic — 0x10B (PE32) or 0x20B (PE32+/64-bit) - AddressOfEntryPoint — RVA where execution begins - ImageBase — preferred load address (0x400000 for exe, 0x10000000 for DLL) - SizeOfImage — virtual size of loaded image - DataDirectory — array of 16 entries; key entries: - Import Table (1) — DLLs and functions this file imports - Export Table (0) — functions this file exports (DLLs) - Resources (2) — embedded resources (icons, strings, version info) - Base Relocations (5) — patching hardcoded addresses when not loaded at ImageBase - TLS (9) — Thread Local Storage; TLS callbacks run before entry point (anti-analysis technique) - .NET Header (14) — managed code metadata
Section Headers, describing each section: - .text — executable code; should have Execute, Read attributes - .data — initialized global variables; Read, Write - .rdata — read-only data (strings, constants, import/export tables) - .rsrc — resource section (icons, strings, embedded files) - .reloc — relocation table - Non-standard section names — packed malware often has unusual sections (.packed, UPX0, etc.)
Import Address Table (IAT), how Windows resolves API calls: - Import Directory Table — list of DLLs being imported - For each DLL: list of function names or ordinals - At load time: Windows fills IAT with actual addresses - Malware analysis: IAT reveals what APIs malware uses without running it
Export Table, listing functions this module exports: - Name, ordinal, RVA of function - DLLs use this to expose functions to other modules
Virtual Address vs Relative Virtual Address (RVA) vs Raw Offset: - ImageBase + RVA = Virtual Address (VA) in memory - Raw offset = position in file on disk - RVA to raw conversion: RVA - section VirtualAddress + section PointerToRawData
PE analysis tools: - PEview — free; visual PE structure viewer - PE-bear — free; comprehensive PE editor and analyzer - pestudio — free; malware triage; import analysis, strings, entropy, AV scanning - CFF Explorer — free; PE editing - dumpbin (MSVC) — command-line PE analysis

ELF (Executable and Linkable Format) — Linux

ELF header — magic (0x7F 45 4C 46 = ".ELF"), class (32/64-bit), data encoding (endianness), OS ABI, ELF type (exec/shared/core), machine architecture, entry point, program header offset, section header offset
Program headers — describe segments for runtime loading (loadable, dynamic linking info, stack permissions)
Section headers — describe sections for linking (symbol table, string tables, relocation sections)
Key sections: - .text — executable code - .data — initialized data - .bss — uninitialized data - .rodata — read-only data (strings) - .plt — Procedure Linkage Table (lazy binding) - .got — Global Offset Table (addresses resolved at runtime) - .dynsym — dynamic symbol table - .dynamic — dynamic linking information
PLT/GOT mechanism, lazy binding for imported functions: - First call: PLT stub jumps to GOT entry pointing to PLT resolver → resolves function → fills GOT entry - Subsequent calls: PLT stub jumps directly to resolved GOT entry - GOT overwrite — classic exploitation target; overwrite GOT entry to redirect control flow
ELF analysis tools: - readelf — display ELF structure - objdump — disassemble and analyze - nm — list symbols - ldd — shared library dependencies - file — identify file type and architecture - strings — extract strings

Packing and Obfuscation Detection

Packing — compress or encrypt the executable; unpacker stub decrypts at runtime - Signs: non-standard section names; high entropy sections (>7.0); few imports (just LoadLibrary, GetProcAddress); small Import Directory - UPX — most common packer; detectable by "UPX" strings; unpack with upx -d - Custom packers — common in sophisticated malware; require dynamic unpacking
Entropy analysis: - Normal code: ~6.0 entropy - Compressed/encrypted: ~7.5–8.0 entropy - Tools: pestudio, binwalk, ent
Import analysis for packing: - Very few imports suggest packing (malware imports needed DLLs dynamically) - LoadLibrary + GetProcAddress = dynamic import resolution - VirtualAlloc + VirtualProtect = unpacking to memory then executing
Detecting packing: - High entropy sections - Entry point outside standard .text section - Few imports - Small Import Directory - Non-standard section names - UPX signatures (UPX0, UPX1)

Resources

Microsoft PE format documentation (free)
"Practical Malware Analysis" by Sikorski and Honig (essential book)
PE-bear documentation (free)
"Learning Malware Analysis" by Monnappa K A (book)

Stage 02

Security Fundamentals & Malware Ecosystem

Understanding what malware does requires understanding what it is attacking and why.

Security Concepts

CIA Triad, defense in depth, least privilege
Windows security model — access tokens, integrity levels, privileges, UAC
Authentication — NTLM, Kerberos, Pass-the-Hash implications
Network protocols — how malware uses HTTP/S, DNS, SMTP, custom protocols for C2
Encryption — symmetric and asymmetric; malware uses both for C2 and ransomware

Malware Taxonomy — Deep

Viruses — attach to legitimate files; self-replicating; file infectors
Worms — self-propagating without user interaction; network worms (SMB), email worms
Trojans — appear legitimate; payload activates after installation; no self-replication
RATs (Remote Access Trojans) — full remote control; keylogging, screenshot, shell
Backdoors — hidden persistent access channel
Keyloggers — capturing keystrokes; credential theft
Spyware — covert data collection; browser history, screenshots, files
Adware — displaying ads; usually bundled with legitimate software
Rootkits: - User-mode rootkits — SSDT hooking, DLL injection, process hiding via API hooking - Kernel-mode rootkits — modify kernel data structures; DKOM; hard to detect - Bootkits — infect MBR/VBR; run before OS; most persistent
Ransomware — encrypt files; demand payment; double extortion (also exfiltrate) - Encryption implementation — symmetric key encrypted by attacker public key - Ransomware families — LockBit, ALPHV/BlackCat, Cl0p, Royal, Black Basta
Banking Trojans — steal financial credentials; form grabbing, man-in-browser
Botnets — networks of compromised machines; DDoS, spam, credential stuffing
Cryptominers — unauthorized CPU/GPU use; Monero (XMR) most common; low visibility goal
Fileless malware — runs entirely in memory; PowerShell, WMI, LOLBins; no disk artifacts
Droppers and loaders — deliver and execute payload; first stage; often minimal capability
Stagers — small shellcode that downloads and executes main payload
Commodity malware — widely available; sold as-a-service; Emotet, IcedID, QBot
APT malware — nation-state; custom-developed; highly capable; targeted

Common Malware Behaviors

Persistence mechanisms — registry run keys, services, scheduled tasks, DLL hijacking, WMI event subscriptions, COM hijacking, startup folder, BITS jobs
Defense evasion — AMSI bypass, ETW patching, AV/EDR evasion, process hollowing, process injection
Credential access — LSASS dumping, Mimikatz-like functionality, browser credential theft, keylogging
Discovery — system enumeration, AD discovery, file system enumeration, network scanning
Lateral movement — PsExec, WMI, SMB, RDP, DCOM
C2 channels — HTTP/S (most common), DNS (covert), ICMP, custom protocols, social media
Exfiltration — HTTPS, DNS, cloud services (Dropbox, OneDrive, Google Drive)

Malware Development Basics (for analysis context)

C/C++ — low-level Windows API access; most sophisticated malware
PowerShell — LOLBin for fileless malware; easy to obfuscate; logs to Event Log
.NET/C# — increasingly common; reflection for in-memory loading; observable via CLR
Python — interpreted; often script-based malware; py2exe compilation
Go — statically compiled; difficult to analyze; growing in offensive tooling
JavaScript/VBScript — initial access; email attachment execution; leverages WSH

Resources

MITRE ATT&CK website (free)
VirusTotal (free tier)
MalwareBazaar (free)
Any.run (free tier)
VirusBay community

Stage 03

Static Analysis

Static analysis examines malware without executing it. It is safer, more thorough, and reveals code logic that dynamic analysis may not trigger.

Initial Triage

File identification: - file command (Linux) / TrID (Windows) — file type without relying on extension - MIME type analysis — actual format vs claimed extension - Magic bytes — first bytes identify format (MZ=PE, 7F454C46=ELF, 504B0304=ZIP, 255044462D=PDF)
Hash calculation — MD5, SHA-1, SHA-256: - md5sum / sha256sum (Linux) - Get-FileHash (PowerShell) - CertUtil -hashfile (Windows CMD) - Check against VirusTotal, MalwareBazaar, Hybrid Analysis
Strings extraction: - strings -n 8 file.exe — minimum 8 character strings - strings -el file.exe — Unicode strings (UTF-16 LE) - Interesting strings: URLs, IPs, domain names, file paths, registry keys, error messages, API function names, compiler artifacts, PDB paths (reveal dev environment), mutex names - floss (FireEye/Mandiant FLOSS) — extract obfuscated strings; deobfuscates stack strings
PEiD / Detect-It-Easy (DIE) — packer and compiler identification
Entropy analysis — sections with entropy > 7.0 suggest packing or encryption
pestudio — comprehensive triage: - Imports, strings, entropy, sections, signatures, VirusTotal integration

Disassembly and Decompilation

Ghidra (Free — NSA Open Source) — Installation and setup — Java dependency; project management; Navigation: Symbol Tree (functions, data, imports, exports), Program Tree (sections/segments), Listing window (disassembly view with addresses, opcodes, mnemonics), Decompiler window (C-like pseudocode, extremely valuable), Function graph (visual control flow graph per function); Key operations: Auto analysis (runs on initial import; creates functions, data types), Rename variables and functions (N key; improves readability), Retype variables (Y key; applying correct data types), Define data (D key; interpreting raw bytes as specific types), Create function (F key at start of code), Apply function signatures (improves decompiler output), Cross-references (XREFs) (what calls this function? what is this string used by?), Mark up and comments (adding analysis notes in listing); Scripting — Python (Jython) or Java scripts; automating analysis tasks; Ghidra for malware analysis: Find entry point (DllMain, WinMain, TLS callbacks), Identify imported APIs (what capabilities does the malware have?), Analyze strings with XREF (where are suspicious strings used?), Identify obfuscation (string decryption routines, anti-disassembly techniques), Reconstruct configuration parsing (C2 addresses often in encrypted config)
IDA Pro (Commercial — Industry Standard) — Editions: Freeware (limited, IDA 8.4), Pro (commercial, most features), HexRays Decompiler; Navigation: Functions window (all identified functions), Names window (all named symbols), Imports/Exports (API calls, exported functions), Graph view (control flow graph, default view), Text view (raw disassembly), Decompiler view (requires HexRays plugin, pseudocode); Key operations — same as Ghidra but with IDA-specific shortcuts: N (rename), Y (retype), / (comment), ; (repeatable comment), x (cross references), a (define string), d (define data), c (define code), p (create function), Escape (return to previous position); FLIRT signatures — function pattern matching to identify library code; IDAPython — scripting IDA with Python; automating analysis, batch processing; Plugins — findcrypt (find crypto constants), YARA scanner, idat automation
Binary Ninja (Commercial with Free Cloud Version) — Modern UI; Python scripting; API focused; Medium-level IL (MLIL) — intermediate representation useful for analysis; Binary Ninja Cloud — free online analysis; Good for plugin development and automated analysis
Radare2 / Cutter — Radare2 — powerful open-source framework; steep learning curve; Cutter — graphical frontend for radare2; more approachable; r2pipe — Python scripting interface

Import Analysis — Critical Skill

What APIs reveal about malware capabilities: - CreateFile, ReadFile, WriteFile — file operations - RegOpenKey, RegSetValue — registry modifications (persistence) - VirtualAlloc, VirtualProtect — memory allocation and permission changes (injection/unpacking) - CreateRemoteThread — process injection - OpenProcess — accessing other processes (injection, credential dumping) - WriteProcessMemory — injecting code into processes - LoadLibrary, GetProcAddress — dynamic import resolution (packing indicator) - CreateService, StartService — service installation (persistence) - CreateScheduledTask equivalent — persistence via task scheduler - WinExec, ShellExecute, CreateProcess — executing commands - InternetOpenUrl, HttpSendRequest, WSAConnect — network communication - RegOpenKeyEx + advapi32 — registry operations - NtCreateSection, NtMapViewOfSection — process hollowing - IsDebuggerPresent, CheckRemoteDebuggerPresent — anti-debugging - GetTickCount, QueryPerformanceCounter — timing-based anti-sandbox

Strings Analysis

Network indicators — URLs, IP addresses, domain names, User-Agent strings
File system indicators — file paths, filenames written/read/created
Registry indicators — registry keys for persistence or configuration
Mutex names — used to prevent multiple instances; unique identifier per malware family
Error messages — often reveal function names, internal logic
Encryption keys/IVs — sometimes hardcoded (weak OPSEC)
PDB paths — debug symbols path; reveals developer machine info
Configuration structures — base64-encoded or hex-encoded strings may be config data

Signature and YARA

YARA — pattern-matching language for malware identification: ``` rule Malware_Family_Name { meta: author = "Analyst" description = "Detects XYZ malware family" strings: $s1 = "http://c2.example.com" nocase $s2 = { 68 65 6C 6C 6F } // hex bytes $s3 = /mutex_[0-9a-f]{8}/ condition: filesize < 1MB and ($s1 or $s2) and $s3 } ```
Writing YARA rules from analysis findings
Testing — yara rule.yar file.exe; yaraify.com for testing
YARA rule sources — GitHub repositories: awesome-yara, Next-Gen-Signatures
Sigma — detection rules for SIEM/EDR (YARA for logs); convert malware behavior to detection

Resources

Practical Malware Analysis (book)
Ghidra documentation (free)
openSecurityTraining2 (free Ghidra course)
Malware Unicorn workshops (free)
OALABS YouTube channel (free)
hasherezade blog (free)
VirusTotal community

Stage 04

Dynamic Analysis

Running malware in a controlled environment to observe its behavior: file system changes, registry modifications, network connections, process creations, and API calls.

Lab Setup — Safety First

Isolated network — never connect malware analysis VM to production network or internet without deliberate decision
Dedicated hardware or heavily isolated VMs — VMware Workstation Pro or VirtualBox
Snapshot before detonation — restore to clean state between samples
Fake internet infrastructure — FakeNet-NG, INetSim intercept and simulate network services
Remove VMware Tools (or similar) — detect and evade tools removed to reduce evasion
Modified environment — change volume serial number, CPUID, MAC address; reduce anti-sandbox triggers
Inetsim — simulates DNS, HTTP, FTP, SMTP so malware can "connect" without reaching real C2

Sandbox Analysis

Automated sandboxes — fast triage before manual analysis: - Any.run (interactive, free tier) — live system interaction; real-time behavior visualization - Cuckoo Sandbox — open-source; self-hosted; highly configurable - Joe Sandbox — commercial; deep behavioral analysis; supports multiple platforms - Hybrid Analysis (by CrowdStrike) — free; CAPE sandbox backend - VirusTotal Sandbox — Jujubox, Tencent HABO and others - Intezer — code similarity and family classification
What sandboxes provide: - Process tree — parent-child process relationships - File system activity — files created, modified, deleted - Registry activity — keys created, modified, deleted - Network activity — DNS queries, HTTP requests, IP connections - API call trace — sequence of Windows API calls - Screenshots — what the malware displayed - Memory dumps — extracted from running process - IOC extraction — IPs, domains, hashes, paths, mutexes

Dynamic Analysis Tools

Process Monitor (Procmon) — real-time file system, registry, network, process/thread activity: - Filters — by Process Name, Path, Result; essential to reduce noise - Event classes — File System, Registry, Network, Process/Thread - Stack trace — see call stack for each event (identify injecting process) - Save as CSV or PML for later analysis
Process Explorer — enhanced Task Manager: - Process tree visualization - DLLs loaded per process - Handles per process - VirusTotal integration per process/DLL - Verify signatures — highlighting unsigned code in signed processes - String search across loaded DLLs
Process Hacker / System Informer — open-source; similar to Process Explorer with more detail
API Monitor — intercept and log Windows API calls: - Monitor specific API categories (file operations, registry, network, crypto) - View parameters and return values - Useful for understanding what functions are called with what arguments
WinAPIOverride — API hooking and logging
Frida — dynamic instrumentation toolkit: - JavaScript API hooking; modify function behavior at runtime - Works on Windows, Linux, macOS, iOS, Android - Frida-trace — automatic hooking of functions by name or pattern - Extremely powerful for bypassing anti-analysis and extracting data
x64dbg (Windows, free) — modern debugger for x64 and x32: - Breakpoints — software (CC), hardware (DR0-DR3), memory (on access/write) - Single-step — step into (F7) vs step over (F8) - Run to cursor (F4); execute until return (Ctrl+F9) - Memory regions view — permissions, module association - Plugins — ScyllaHide (anti-anti-debug), xAnalyzer, x64dbgpy (Python) - Conditional breakpoints — break on specific register values or memory contents
WinDbg (Microsoft) — kernel debugging, crash dump analysis: - Mandatory for kernel-mode analysis, rootkits, driver analysis - Commands: !process, !thread, !drvobj, !pte, kb (stack trace), lm (list modules) - dt (display type) — parsing data structures from symbols - Memory analysis commands: dc, dd, db, dq - Crash dump analysis — !analyze -v for automatic analysis
GDB (Linux) — GNU debugger; supports x86, x64, ARM, MIPS: - gdb ./malware; run; break *address; step; next; info registers; x/10x $rsp - GEF / pwndbg / PEDA — GDB enhancement plugins for security research
LLDB — macOS/iOS debugger; similar to GDB interface
FakeNet-NG — intercepts network calls; provides fake services: - DNS server — responds to all queries with configured IP - HTTP/HTTPS server — logs requests; serves fake responses - FTP, SMTP, IRC simulators - Log file — full captured traffic with decoded protocols
Wireshark / tcpdump — capture actual traffic when allowing controlled internet access
Burp Suite — intercepting HTTP/HTTPS proxy for web-based C2 analysis

Bypassing Anti-Analysis Techniques

Anti-debugging detection and bypass: - IsDebuggerPresent — patch return to 0; or use ScyllaHide plugin - CheckRemoteDebuggerPresent — same approach - NtQueryInformationProcess — Process Debug Port query; return null - Heap flags check — PEB.NtGlobalFlag (0x70 when debugging); patch - Timing checks — RDTSC, GetTickCount, QueryPerformanceCounter; patch to return constant - Parent process check — malware checks if parent is explorer.exe; spoof PPID
Anti-VM detection and bypass: - CPUID — check for VMware/VirtualBox signatures; patch or modify CPUID - Registry checks — VMware tools keys; remove or rename - File checks — VMware/VirtualBox driver files; hide or remove - MAC address — VMware/VirtualBox OUI prefixes; change MAC - Timing — VM is slower; mitigate by adjusting timing functions - Sandbox behavior — no user activity; click simulator; minimize obvious artifacts
Code obfuscation: - Junk code — NOPs, irrelevant instructions; trace through carefully - Opaque predicates — always-true or always-false conditions; static analysis determines which branch - Anti-disassembly — crafted bytes that confuse disassemblers; use hex view + manual analysis - Encrypted/packed code — requires dynamic unpacking before static analysis

Unpacking

Generic unpacking approach: 1. Set breakpoint on entry point (OEP) 2. Run until OEP is reached (stub has decrypted original code) 3. Dump process from memory 4. Fix IAT (Import Address Table)
OEP finding techniques: - Breakpoint on VirtualAlloc — often called when allocating space for unpacked code - Hardware breakpoint on first byte of entry point after it's written - ESP trick — break when return from first function; stack points to OEP
Scylla — IAT fixing and process dumping tool; commonly used with x64dbg
Dump + Fix approach: - Get-ProcessDump from PE-sieve or Process Hacker - Fix IAT with Scylla or manually - Load reconstructed PE into Ghidra for static analysis

Resources

Any.run (free tier)
Cuckoo Sandbox documentation (free)
x64dbg documentation (free)
"Practical Malware Analysis" chapter on dynamic analysis
OALabs YouTube channel (free)
hasherezade workshops (free)

Stage 05

Advanced Analysis Topics

Going beyond basic analysis into areas that distinguish senior malware analysts: network protocols, rootkits, scripting, and specialized platforms.

Custom Network Protocol Analysis

Identifying custom protocols — unusual port usage; non-standard data patterns; regular interval traffic
Protocol reversal process: 1. Capture traffic in Wireshark during sandbox run 2. Follow TCP/UDP stream — examine raw bytes 3. Identify fields — fixed headers, length fields, type fields, data 4. Correlate with code — find parsing and construction code in disassembly 5. Document structure — define in Wireshark dissector or code comment
C2 protocol categories: - HTTP-based — blends with legitimate traffic; URL patterns, POST bodies, cookie abuse - HTTPS — encrypted; JTLS interception possible in sandbox - DNS — commands in subdomain queries; responses in A/TXT records - Custom TCP — raw binary protocol; XOR or simple encryption common - Covert channels — ICMP, SMTP, SMB

Rootkit Analysis

User-mode rootkit techniques: - API hooking — IAT hooking, inline hooking (patching first bytes of function) - DLL injection — loaded into every process via AppInit_DLLs or global hooks - Process hiding — filtering results of NtQuerySystemInformation
Kernel-mode rootkit techniques: - SSDT hooking — modifying System Service Descriptor Table entries - DKOM — modifying kernel data structures (EPROCESS list, driver list) - IRP hooking — intercepting driver communication - Filter drivers — filesystem/network filter drivers
Detection approach: - WinDbg cross-view analysis — compare kernel list with other artifacts - Volatility psscan vs pslist — find DKOM-hidden processes - API hooking detection — compare IAT entries against actual module exports

Scripting and Automation

IDA Python / Ghidra Python: - Automating renaming based on patterns - Extracting and decoding strings automatically - Generating YARA rules from found patterns - Commenting API call patterns
Python for malware analysis: - pefile — PE format parsing - capstone — disassembly engine; Python bindings - frida — dynamic instrumentation from Python - yara-python — YARA integration - dpkt — PCAP parsing - volatility3 — memory analysis automation

Malware Families Research

Tracking families — VX Underground, MalwareBazaar, theZoo (GitHub), URLhaus
Configuration extractors — scripted decryption of embedded C2 configs
Unpacking scripts — family-specific unpackers for common packers
Family similarity — code reuse across campaigns; shared crypto implementations
Attribution challenges — false flags, shared tooling, code sharing

Resources

"Rootkits and Bootkits" (Matrosov, Rodionov, Bratus) book
SANS FOR610 Malware Analysis course
OpenAnalysis Labs (OALabs) YouTube (free)
hasherezade GitHub (free tools and tutorials)

Stage 06

Threat Intelligence Integration

Malware analysis output feeds threat intelligence. Analysts who can produce actionable intelligence from technical findings have significantly more impact.

IOC Extraction and Formatting

IOC types from malware analysis: - File hashes — MD5, SHA-1, SHA-256 of malware samples and dropped files - Network indicators — C2 IPs, domains, URLs, User-Agent strings, URI patterns - Host indicators — registry keys, file paths, mutex names, service names, scheduled task names - Behavioral indicators — command-line patterns, parent-child process relationships, API sequences
STIX (Structured Threat Information eXpression) — format for sharing threat intelligence
TAXII — transport protocol for sharing STIX data
MISP — Malware Information Sharing Platform; structured IOC storage and sharing

YARA Rule Development

Moving beyond basic YARA to robust, low-false-positive rules: - Condition logic — filesize, import counts, section counts - PE module — pe.number_of_sections, pe.imphash(), pe.exports() - Combining string matches with structural features - Testing rules against clean and dirty samples
Rule distribution — VirusTotal hunting, MISP events, threat intel sharing

Detection Engineering from Analysis

Sigma rules for SIEM detection from behavioral analysis: - Process creation rules from observed command lines - Registry modification rules from persistence mechanisms - Network detection rules from C2 communication patterns
EDR rule writing — CrowdStrike custom IOAs, SentinelOne custom rules
ATT&CK mapping — documenting techniques observed in each malware sample

Reporting

Malware analysis report structure: - Executive Summary — malware name/family, risk level, key capabilities, IOCs - Technical Analysis — static findings, dynamic behavior, network communication, persistence - Detection and Response — YARA rules, EDR signatures, network signatures, remediation - IOC Appendix — all indicators with confidence levels
Audience calibration — internal security team needs different detail than executive leadership
Attribution language — careful hedging; "consistent with," "possibly," "likely"

Resources

MITRE ATT&CK (free)
MISP documentation (free)
Sigma rules GitHub (free)
Recorded Future community (free blog)
Mandiant threat intelligence blog (free)

FAQ

Common questions

How long does it take to become a Malware Analyst?

3–4 years optimistic at 20–25 hours/week, 4–5 years realistic. Reverse engineering demands deep operating system internals knowledge (Windows kernel, PE format, memory management), assembly fluency (x86-64 minimum, ARM increasingly), and obsessive practice with sample malware. There's no shortcut — the role is built on accumulated reading hours of disassembled code. Pure self-taught paths exist but typically take longer than security-engineer-to-malware-analyst transitions.

Which certifications matter for malware analysis?

GREM (GIAC Reverse Engineering Malware) is the canonical cert. GCFA for forensics depth. OSEE for advanced exploitation work. SANS courses are expensive but content is genuinely the gold standard. Many roles require security clearance — Fort Meade, Herndon VA, and the DC corridor concentrate government and government-contractor opportunities. Clearance significantly expands the job market.

Do I need a CS or computer engineering degree?

Helpful but not required for corporate roles. Federal and government-contractor roles often require a bachelor's plus clearance. The technical bar is high — assembly literacy, OS internals, and reverse engineering tooling (IDA Pro, Ghidra, x64dbg) — which favors candidates with formal CS exposure but doesn't strictly require it. Self-taught paths through CTF reverse engineering, Hack The Box challenges, and Flare-On competitions produce competitive candidates.

What separates a hired Malware Analyst?

Public reverse engineering writeups. Sample analyses on MalwareBazaar samples, Flare-On challenge solutions, Hack The Box reverse engineering writeups — documented technical work signals capability beyond theoretical knowledge. IDA Pro and Ghidra appear as required or preferred in nearly every reverse engineering posting. Bonus: contributions to Volatility plugins, YARA rule signatures for malware families, and detection engineering ties to threat intel.

Malware Analyst / Reverse Engineer

Common questions

Related roles