Roadmap
Malware Analyst / Reverse Engineer
The specialist who dissects malicious software to understand how it works, what it does, and how to detect and defend against it. Uses static analysis (reading disassembled code), dynamic analysis (running malware in controlled environments), and reverse engineering (reconstructing intent from binary instructions).
OPTIMISTIC 3–4 years · REALISTIC 4–5 years
Stage 00
Computer Architecture & Assembly Language
Malware analysis requires reading compiled machine code as if it were source code. This is impossible without understanding CPU architecture, registers, memory, and assembly instructions.
CPU Architecture Fundamentals
- Von Neumann architecture — fetch → decode → execute cycle
- CPU components — ALU (Arithmetic Logic Unit), registers, control unit, cache hierarchy (L1/L2/L3)
- Memory hierarchy — registers (fastest) → cache → RAM → storage (slowest)
- Instruction pipelining — executing multiple instructions simultaneously; hazards
- Endianness — little-endian (x86/x64, ARM mostly) vs big-endian (network byte order, some MIPS) - Little-endian: least significant byte at lowest address - Example: 0x12345678 stored as 78 56 34 12 in memory - Network traffic uses big-endian (network byte order)
Number Systems for Assembly
- Binary — base-2; bit operations: AND, OR, XOR, NOT, SHL, SHR - XOR a, a = 0 (zeroing a register without encoding a null byte) - AND for masking bits; OR for setting bits; NOT for bit inversion
- Hexadecimal — base-16; two hex digits per byte; memory addresses in hex - 0x41 = 'A' in ASCII; 0x90 = NOP (no-operation) instruction
- Two's complement — signed integer representation - Negative numbers: invert all bits, add 1 - -1 in 32-bit = 0xFFFFFFFF; -128 in 8-bit = 0x80
x86 and x64 Registers — Memorize These
- General purpose (x86 32-bit): - EAX — Accumulator; return values from functions - EBX — Base register; often used for addressing - ECX — Counter; loop counter, function argument on some calling conventions - EDX — Data; I/O port addressing, high-order 32 bits of multiplication results - ESI — Source Index; source pointer in string operations - EDI — Destination Index; destination pointer in string operations - ESP — Stack Pointer; points to current top of stack - EBP — Base Pointer; points to current stack frame base
- x64 extensions (64-bit): - RAX, RBX, RCX, RDX, RSI, RDI, RSP, RBP — 64-bit versions - R8–R15 — additional 64-bit registers - 32-bit sub-registers: EAX, EBX, ECX, EDX, etc. (zero-extend upper 32 bits when written) - 16-bit sub-registers: AX, BX, CX, DX, SI, DI, SP, BP - 8-bit sub-registers: AL/AH (low/high byte of AX), BL/BH, CL/CH, DL/DH
- Instruction Pointer: - EIP (x86) / RIP (x64) — points to next instruction to execute; key exploit target
- Flags register (EFLAGS/RFLAGS): - CF — Carry Flag; arithmetic carry/borrow - ZF — Zero Flag; result was zero; used by conditional jumps - SF — Sign Flag; result was negative - OF — Overflow Flag; signed arithmetic overflow - PF — Parity Flag; even number of 1 bits in result
- Segment registers — CS (code), DS (data), SS (stack), ES/FS/GS (extra segments) - FS segment — Windows: points to Thread Information Block (TIB); critical for exploitation and anti-debugging - GS segment — Linux 64-bit: thread-local storage
- Control registers — CR0 (protected mode enable), CR3 (page directory base), CR4
Memory Model and Stack
- Memory segments in a process: - Text/Code segment — executable instructions; read-only; shared - Data segment — initialized global and static variables - BSS — uninitialized global and static variables; zeroed at startup - Heap — dynamically allocated memory; grows up; malloc/new - Stack — local variables, function call information; grows down; push/pop - Memory-mapped region — DLLs, mapped files
- Stack frame mechanics, critical for understanding function calls: - CALL instruction — pushes return address (EIP/RIP value of next instruction) onto stack; jumps to function - Function prologue — PUSH EBP; MOV EBP, ESP; SUB ESP, [local var space] - Local variables — at negative offsets from EBP (EBP-4, EBP-8, etc.) - Parameters (32-bit cdecl) — at positive offsets from EBP (EBP+8, EBP+12, etc.) - Function epilogue — MOV ESP, EBP; POP EBP; RET - RET instruction — pops return address from stack; jumps to it
- Calling conventions, defining how parameters are passed and who cleans up the stack: - cdecl (C default) — parameters pushed right-to-left; caller cleans stack - stdcall (WinAPI) — parameters pushed right-to-left; callee cleans stack - fastcall — first two params in ECX, EDX; rest on stack - x64 Windows — first four params in RCX, RDX, R8, R9; rest on stack; shadow space - x64 Linux (System V AMD64 ABI) — first six params in RDI, RSI, RDX, RCX, R8, R9
Core x86 Assembly Instructions
- Data movement: - MOV dest, src — copy data (registers, memory, immediate) - PUSH val — decrements ESP, writes val to [ESP] - POP dest — reads [ESP] to dest, increments ESP - LEA dest, [addr] — load effective address; often used for address calculations - XCHG a, b — exchange values - MOVZX / MOVSX — zero-extend / sign-extend smaller to larger register
- Arithmetic: - ADD dest, src — addition; sets flags - SUB dest, src — subtraction; sets flags - INC dest — increment by 1 - DEC dest — decrement by 1 - MUL / IMUL — unsigned / signed multiply; result in EDX:EAX (32-bit) - DIV / IDIV — unsigned / signed divide; quotient in EAX, remainder in EDX - NEG dest — negate (two's complement)
- Bitwise: - AND, OR, XOR, NOT — bitwise operations - SHL / SHR — shift left / right logical - SAR — shift arithmetic right (preserves sign bit) - ROL / ROR — rotate left / right - TEST dest, src — AND without modifying dest; sets flags (used before JZ/JNZ) - CMP dest, src — SUB without modifying dest; sets flags (used before conditional jumps)
- Control flow: - JMP label — unconditional jump - JE / JZ — jump if equal / zero flag set - JNE / JNZ — jump if not equal / zero flag clear - JG / JGE / JA / JAE — jump if greater/greater-equal (signed/unsigned) - JL / JLE / JB / JBE — jump if less/less-equal (signed/unsigned) - JC / JNC — jump if carry / no carry - CALL addr — push return address, jump to function - RET [n] — return from function; optional stack cleanup - LOOP — decrements ECX; jumps if ECX not zero
- String operations: - MOVS — move string (MOVSB byte, MOVSW word, MOVSD dword) - CMPS — compare strings - SCAS — scan string for value in AL/AX/EAX - STOS — store value to string (fill memory) - REP prefix — repeat string operation ECX times
Resources
- "x86 Assembly Language and C Fundamentals" by Joseph Cavanagh (book)
- x86 instruction reference (felixcloutier.com/x86/, free)
- "Computer Organization and Design" (Patterson & Hennessy, book)
- nasm.us documentation (free)
- godbolt.org (compile C code to assembly online, free)
Stage 01
Executable File Formats
Malware is packed into PE files on Windows and ELF files on Linux. Understanding the container format is essential to understanding what the malware is.
PE (Portable Executable) Format — Windows
- Overview — container format for .exe, .dll, .sys, .drv, .ocx files
- DOS Header (IMAGE_DOS_HEADER): - e_magic — "MZ" (0x4D5A) magic bytes; Mark Zbikowski's initials - e_lfanew — offset to PE header
- PE Header (IMAGE_NT_HEADERS): - Signature — "PE\0\0" (0x50450000) - COFF File Header (IMAGE_FILE_HEADER): Machine type, NumberOfSections, TimeDateStamp, Characteristics (DLL flag, executable flag) - Optional Header (IMAGE_OPTIONAL_HEADER): - Magic — 0x10B (PE32) or 0x20B (PE32+/64-bit) - AddressOfEntryPoint — RVA where execution begins - ImageBase — preferred load address (0x400000 for exe, 0x10000000 for DLL) - SizeOfImage — virtual size of loaded image - DataDirectory — array of 16 entries; key entries: - Import Table (1) — DLLs and functions this file imports - Export Table (0) — functions this file exports (DLLs) - Resources (2) — embedded resources (icons, strings, version info) - Base Relocations (5) — patching hardcoded addresses when not loaded at ImageBase - TLS (9) — Thread Local Storage; TLS callbacks run before entry point (anti-analysis technique) - .NET Header (14) — managed code metadata
- Section Headers, describing each section: - .text — executable code; should have Execute, Read attributes - .data — initialized global variables; Read, Write - .rdata — read-only data (strings, constants, import/export tables) - .rsrc — resource section (icons, strings, embedded files) - .reloc — relocation table - Non-standard section names — packed malware often has unusual sections (.packed, UPX0, etc.)
- Import Address Table (IAT), how Windows resolves API calls: - Import Directory Table — list of DLLs being imported - For each DLL: list of function names or ordinals - At load time: Windows fills IAT with actual addresses - Malware analysis: IAT reveals what APIs malware uses without running it
- Export Table, listing functions this module exports: - Name, ordinal, RVA of function - DLLs use this to expose functions to other modules
- Virtual Address vs Relative Virtual Address (RVA) vs Raw Offset: - ImageBase + RVA = Virtual Address (VA) in memory - Raw offset = position in file on disk - RVA to raw conversion: RVA - section VirtualAddress + section PointerToRawData
- PE analysis tools: - PEview — free; visual PE structure viewer - PE-bear — free; comprehensive PE editor and analyzer - pestudio — free; malware triage; import analysis, strings, entropy, AV scanning - CFF Explorer — free; PE editing - dumpbin (MSVC) — command-line PE analysis
ELF (Executable and Linkable Format) — Linux
- ELF header — magic (0x7F 45 4C 46 = ".ELF"), class (32/64-bit), data encoding (endianness), OS ABI, ELF type (exec/shared/core), machine architecture, entry point, program header offset, section header offset
- Program headers — describe segments for runtime loading (loadable, dynamic linking info, stack permissions)
- Section headers — describe sections for linking (symbol table, string tables, relocation sections)
- Key sections: - .text — executable code - .data — initialized data - .bss — uninitialized data - .rodata — read-only data (strings) - .plt — Procedure Linkage Table (lazy binding) - .got — Global Offset Table (addresses resolved at runtime) - .dynsym — dynamic symbol table - .dynamic — dynamic linking information
- PLT/GOT mechanism, lazy binding for imported functions: - First call: PLT stub jumps to GOT entry pointing to PLT resolver → resolves function → fills GOT entry - Subsequent calls: PLT stub jumps directly to resolved GOT entry - GOT overwrite — classic exploitation target; overwrite GOT entry to redirect control flow
- ELF analysis tools: - readelf — display ELF structure - objdump — disassemble and analyze - nm — list symbols - ldd — shared library dependencies - file — identify file type and architecture - strings — extract strings
Packing and Obfuscation Detection
- Packing — compress or encrypt the executable; unpacker stub decrypts at runtime - Signs: non-standard section names; high entropy sections (>7.0); few imports (just LoadLibrary, GetProcAddress); small Import Directory - UPX — most common packer; detectable by "UPX" strings; unpack with upx -d - Custom packers — common in sophisticated malware; require dynamic unpacking
- Entropy analysis: - Normal code: ~6.0 entropy - Compressed/encrypted: ~7.5–8.0 entropy - Tools: pestudio, binwalk, ent
- Import analysis for packing: - Very few imports suggest packing (malware imports needed DLLs dynamically) - LoadLibrary + GetProcAddress = dynamic import resolution - VirtualAlloc + VirtualProtect = unpacking to memory then executing
- Detecting packing: - High entropy sections - Entry point outside standard .text section - Few imports - Small Import Directory - Non-standard section names - UPX signatures (UPX0, UPX1)
Resources
- Microsoft PE format documentation (free)
- "Practical Malware Analysis" by Sikorski and Honig (essential book)
- PE-bear documentation (free)
- "Learning Malware Analysis" by Monnappa K A (book)
Stage 02
Security Fundamentals & Malware Ecosystem
Understanding what malware does requires understanding what it is attacking and why.
Security Concepts
- CIA Triad, defense in depth, least privilege
- Windows security model — access tokens, integrity levels, privileges, UAC
- Authentication — NTLM, Kerberos, Pass-the-Hash implications
- Network protocols — how malware uses HTTP/S, DNS, SMTP, custom protocols for C2
- Encryption — symmetric and asymmetric; malware uses both for C2 and ransomware
Malware Taxonomy — Deep
- Viruses — attach to legitimate files; self-replicating; file infectors
- Worms — self-propagating without user interaction; network worms (SMB), email worms
- Trojans — appear legitimate; payload activates after installation; no self-replication
- RATs (Remote Access Trojans) — full remote control; keylogging, screenshot, shell
- Backdoors — hidden persistent access channel
- Keyloggers — capturing keystrokes; credential theft
- Spyware — covert data collection; browser history, screenshots, files
- Adware — displaying ads; usually bundled with legitimate software
- Rootkits: - User-mode rootkits — SSDT hooking, DLL injection, process hiding via API hooking - Kernel-mode rootkits — modify kernel data structures; DKOM; hard to detect - Bootkits — infect MBR/VBR; run before OS; most persistent
- Ransomware — encrypt files; demand payment; double extortion (also exfiltrate) - Encryption implementation — symmetric key encrypted by attacker public key - Ransomware families — LockBit, ALPHV/BlackCat, Cl0p, Royal, Black Basta
- Banking Trojans — steal financial credentials; form grabbing, man-in-browser
- Botnets — networks of compromised machines; DDoS, spam, credential stuffing
- Cryptominers — unauthorized CPU/GPU use; Monero (XMR) most common; low visibility goal
- Fileless malware — runs entirely in memory; PowerShell, WMI, LOLBins; no disk artifacts
- Droppers and loaders — deliver and execute payload; first stage; often minimal capability
- Stagers — small shellcode that downloads and executes main payload
- Commodity malware — widely available; sold as-a-service; Emotet, IcedID, QBot
- APT malware — nation-state; custom-developed; highly capable; targeted
Common Malware Behaviors
- Persistence mechanisms — registry run keys, services, scheduled tasks, DLL hijacking, WMI event subscriptions, COM hijacking, startup folder, BITS jobs
- Defense evasion — AMSI bypass, ETW patching, AV/EDR evasion, process hollowing, process injection
- Credential access — LSASS dumping, Mimikatz-like functionality, browser credential theft, keylogging
- Discovery — system enumeration, AD discovery, file system enumeration, network scanning
- Lateral movement — PsExec, WMI, SMB, RDP, DCOM
- C2 channels — HTTP/S (most common), DNS (covert), ICMP, custom protocols, social media
- Exfiltration — HTTPS, DNS, cloud services (Dropbox, OneDrive, Google Drive)
Malware Development Basics (for analysis context)
- C/C++ — low-level Windows API access; most sophisticated malware
- PowerShell — LOLBin for fileless malware; easy to obfuscate; logs to Event Log
- .NET/C# — increasingly common; reflection for in-memory loading; observable via CLR
- Python — interpreted; often script-based malware; py2exe compilation
- Go — statically compiled; difficult to analyze; growing in offensive tooling
- JavaScript/VBScript — initial access; email attachment execution; leverages WSH
Resources
- MITRE ATT&CK website (free)
- VirusTotal (free tier)
- MalwareBazaar (free)
- Any.run (free tier)
- VirusBay community
Stage 03
Static Analysis
Static analysis examines malware without executing it. It is safer, more thorough, and reveals code logic that dynamic analysis may not trigger.
Initial Triage
- File identification: - file command (Linux) / TrID (Windows) — file type without relying on extension - MIME type analysis — actual format vs claimed extension - Magic bytes — first bytes identify format (MZ=PE, 7F454C46=ELF, 504B0304=ZIP, 255044462D=PDF)
- Hash calculation — MD5, SHA-1, SHA-256: - md5sum / sha256sum (Linux) - Get-FileHash (PowerShell) - CertUtil -hashfile (Windows CMD) - Check against VirusTotal, MalwareBazaar, Hybrid Analysis
- Strings extraction: - strings -n 8 file.exe — minimum 8 character strings - strings -el file.exe — Unicode strings (UTF-16 LE) - Interesting strings: URLs, IPs, domain names, file paths, registry keys, error messages, API function names, compiler artifacts, PDB paths (reveal dev environment), mutex names - floss (FireEye/Mandiant FLOSS) — extract obfuscated strings; deobfuscates stack strings
- PEiD / Detect-It-Easy (DIE) — packer and compiler identification
- Entropy analysis — sections with entropy > 7.0 suggest packing or encryption
- pestudio — comprehensive triage: - Imports, strings, entropy, sections, signatures, VirusTotal integration
Disassembly and Decompilation
- Ghidra (Free — NSA Open Source) — Installation and setup — Java dependency; project management; Navigation: Symbol Tree (functions, data, imports, exports), Program Tree (sections/segments), Listing window (disassembly view with addresses, opcodes, mnemonics), Decompiler window (C-like pseudocode, extremely valuable), Function graph (visual control flow graph per function); Key operations: Auto analysis (runs on initial import; creates functions, data types), Rename variables and functions (N key; improves readability), Retype variables (Y key; applying correct data types), Define data (D key; interpreting raw bytes as specific types), Create function (F key at start of code), Apply function signatures (improves decompiler output), Cross-references (XREFs) (what calls this function? what is this string used by?), Mark up and comments (adding analysis notes in listing); Scripting — Python (Jython) or Java scripts; automating analysis tasks; Ghidra for malware analysis: Find entry point (DllMain, WinMain, TLS callbacks), Identify imported APIs (what capabilities does the malware have?), Analyze strings with XREF (where are suspicious strings used?), Identify obfuscation (string decryption routines, anti-disassembly techniques), Reconstruct configuration parsing (C2 addresses often in encrypted config)
- IDA Pro (Commercial — Industry Standard) — Editions: Freeware (limited, IDA 8.4), Pro (commercial, most features), HexRays Decompiler; Navigation: Functions window (all identified functions), Names window (all named symbols), Imports/Exports (API calls, exported functions), Graph view (control flow graph, default view), Text view (raw disassembly), Decompiler view (requires HexRays plugin, pseudocode); Key operations — same as Ghidra but with IDA-specific shortcuts: N (rename), Y (retype), / (comment), ; (repeatable comment), x (cross references), a (define string), d (define data), c (define code), p (create function), Escape (return to previous position); FLIRT signatures — function pattern matching to identify library code; IDAPython — scripting IDA with Python; automating analysis, batch processing; Plugins — findcrypt (find crypto constants), YARA scanner, idat automation
- Binary Ninja (Commercial with Free Cloud Version) — Modern UI; Python scripting; API focused; Medium-level IL (MLIL) — intermediate representation useful for analysis; Binary Ninja Cloud — free online analysis; Good for plugin development and automated analysis
- Radare2 / Cutter — Radare2 — powerful open-source framework; steep learning curve; Cutter — graphical frontend for radare2; more approachable; r2pipe — Python scripting interface
Import Analysis — Critical Skill
- What APIs reveal about malware capabilities: - CreateFile, ReadFile, WriteFile — file operations - RegOpenKey, RegSetValue — registry modifications (persistence) - VirtualAlloc, VirtualProtect — memory allocation and permission changes (injection/unpacking) - CreateRemoteThread — process injection - OpenProcess — accessing other processes (injection, credential dumping) - WriteProcessMemory — injecting code into processes - LoadLibrary, GetProcAddress — dynamic import resolution (packing indicator) - CreateService, StartService — service installation (persistence) - CreateScheduledTask equivalent — persistence via task scheduler - WinExec, ShellExecute, CreateProcess — executing commands - InternetOpenUrl, HttpSendRequest, WSAConnect — network communication - RegOpenKeyEx + advapi32 — registry operations - NtCreateSection, NtMapViewOfSection — process hollowing - IsDebuggerPresent, CheckRemoteDebuggerPresent — anti-debugging - GetTickCount, QueryPerformanceCounter — timing-based anti-sandbox
Strings Analysis
- Network indicators — URLs, IP addresses, domain names, User-Agent strings
- File system indicators — file paths, filenames written/read/created
- Registry indicators — registry keys for persistence or configuration
- Mutex names — used to prevent multiple instances; unique identifier per malware family
- Error messages — often reveal function names, internal logic
- Encryption keys/IVs — sometimes hardcoded (weak OPSEC)
- PDB paths — debug symbols path; reveals developer machine info
- Configuration structures — base64-encoded or hex-encoded strings may be config data
Signature and YARA
- YARA — pattern-matching language for malware identification: ``` rule Malware_Family_Name { meta: author = "Analyst" description = "Detects XYZ malware family" strings: $s1 = "http://c2.example.com" nocase $s2 = { 68 65 6C 6C 6F } // hex bytes $s3 = /mutex_[0-9a-f]{8}/ condition: filesize < 1MB and ($s1 or $s2) and $s3 } ```
- Writing YARA rules from analysis findings
- Testing — yara rule.yar file.exe; yaraify.com for testing
- YARA rule sources — GitHub repositories: awesome-yara, Next-Gen-Signatures
- Sigma — detection rules for SIEM/EDR (YARA for logs); convert malware behavior to detection
Resources
- Practical Malware Analysis (book)
- Ghidra documentation (free)
- openSecurityTraining2 (free Ghidra course)
- Malware Unicorn workshops (free)
- OALABS YouTube channel (free)
- hasherezade blog (free)
- VirusTotal community
Stage 04
Dynamic Analysis
Running malware in a controlled environment to observe its behavior: file system changes, registry modifications, network connections, process creations, and API calls.
Lab Setup — Safety First
- Isolated network — never connect malware analysis VM to production network or internet without deliberate decision
- Dedicated hardware or heavily isolated VMs — VMware Workstation Pro or VirtualBox
- Snapshot before detonation — restore to clean state between samples
- Fake internet infrastructure — FakeNet-NG, INetSim intercept and simulate network services
- Remove VMware Tools (or similar) — detect and evade tools removed to reduce evasion
- Modified environment — change volume serial number, CPUID, MAC address; reduce anti-sandbox triggers
- Inetsim — simulates DNS, HTTP, FTP, SMTP so malware can "connect" without reaching real C2
Sandbox Analysis
- Automated sandboxes — fast triage before manual analysis: - Any.run (interactive, free tier) — live system interaction; real-time behavior visualization - Cuckoo Sandbox — open-source; self-hosted; highly configurable - Joe Sandbox — commercial; deep behavioral analysis; supports multiple platforms - Hybrid Analysis (by CrowdStrike) — free; CAPE sandbox backend - VirusTotal Sandbox — Jujubox, Tencent HABO and others - Intezer — code similarity and family classification
- What sandboxes provide: - Process tree — parent-child process relationships - File system activity — files created, modified, deleted - Registry activity — keys created, modified, deleted - Network activity — DNS queries, HTTP requests, IP connections - API call trace — sequence of Windows API calls - Screenshots — what the malware displayed - Memory dumps — extracted from running process - IOC extraction — IPs, domains, hashes, paths, mutexes
Dynamic Analysis Tools
- Process Monitor (Procmon) — real-time file system, registry, network, process/thread activity: - Filters — by Process Name, Path, Result; essential to reduce noise - Event classes — File System, Registry, Network, Process/Thread - Stack trace — see call stack for each event (identify injecting process) - Save as CSV or PML for later analysis
- Process Explorer — enhanced Task Manager: - Process tree visualization - DLLs loaded per process - Handles per process - VirusTotal integration per process/DLL - Verify signatures — highlighting unsigned code in signed processes - String search across loaded DLLs
- Process Hacker / System Informer — open-source; similar to Process Explorer with more detail
- API Monitor — intercept and log Windows API calls: - Monitor specific API categories (file operations, registry, network, crypto) - View parameters and return values - Useful for understanding what functions are called with what arguments
- WinAPIOverride — API hooking and logging
- Frida — dynamic instrumentation toolkit: - JavaScript API hooking; modify function behavior at runtime - Works on Windows, Linux, macOS, iOS, Android - Frida-trace — automatic hooking of functions by name or pattern - Extremely powerful for bypassing anti-analysis and extracting data
- x64dbg (Windows, free) — modern debugger for x64 and x32: - Breakpoints — software (CC), hardware (DR0-DR3), memory (on access/write) - Single-step — step into (F7) vs step over (F8) - Run to cursor (F4); execute until return (Ctrl+F9) - Memory regions view — permissions, module association - Plugins — ScyllaHide (anti-anti-debug), xAnalyzer, x64dbgpy (Python) - Conditional breakpoints — break on specific register values or memory contents
- WinDbg (Microsoft) — kernel debugging, crash dump analysis: - Mandatory for kernel-mode analysis, rootkits, driver analysis - Commands: !process, !thread, !drvobj, !pte, kb (stack trace), lm (list modules) - dt (display type) — parsing data structures from symbols - Memory analysis commands: dc, dd, db, dq - Crash dump analysis — !analyze -v for automatic analysis
- GDB (Linux) — GNU debugger; supports x86, x64, ARM, MIPS: - gdb ./malware; run; break *address; step; next; info registers; x/10x $rsp - GEF / pwndbg / PEDA — GDB enhancement plugins for security research
- LLDB — macOS/iOS debugger; similar to GDB interface
- FakeNet-NG — intercepts network calls; provides fake services: - DNS server — responds to all queries with configured IP - HTTP/HTTPS server — logs requests; serves fake responses - FTP, SMTP, IRC simulators - Log file — full captured traffic with decoded protocols
- Wireshark / tcpdump — capture actual traffic when allowing controlled internet access
- Burp Suite — intercepting HTTP/HTTPS proxy for web-based C2 analysis
Bypassing Anti-Analysis Techniques
- Anti-debugging detection and bypass: - IsDebuggerPresent — patch return to 0; or use ScyllaHide plugin - CheckRemoteDebuggerPresent — same approach - NtQueryInformationProcess — Process Debug Port query; return null - Heap flags check — PEB.NtGlobalFlag (0x70 when debugging); patch - Timing checks — RDTSC, GetTickCount, QueryPerformanceCounter; patch to return constant - Parent process check — malware checks if parent is explorer.exe; spoof PPID
- Anti-VM detection and bypass: - CPUID — check for VMware/VirtualBox signatures; patch or modify CPUID - Registry checks — VMware tools keys; remove or rename - File checks — VMware/VirtualBox driver files; hide or remove - MAC address — VMware/VirtualBox OUI prefixes; change MAC - Timing — VM is slower; mitigate by adjusting timing functions - Sandbox behavior — no user activity; click simulator; minimize obvious artifacts
- Code obfuscation: - Junk code — NOPs, irrelevant instructions; trace through carefully - Opaque predicates — always-true or always-false conditions; static analysis determines which branch - Anti-disassembly — crafted bytes that confuse disassemblers; use hex view + manual analysis - Encrypted/packed code — requires dynamic unpacking before static analysis
Unpacking
- Generic unpacking approach: 1. Set breakpoint on entry point (OEP) 2. Run until OEP is reached (stub has decrypted original code) 3. Dump process from memory 4. Fix IAT (Import Address Table)
- OEP finding techniques: - Breakpoint on VirtualAlloc — often called when allocating space for unpacked code - Hardware breakpoint on first byte of entry point after it's written - ESP trick — break when return from first function; stack points to OEP
- Scylla — IAT fixing and process dumping tool; commonly used with x64dbg
- Dump + Fix approach: - Get-ProcessDump from PE-sieve or Process Hacker - Fix IAT with Scylla or manually - Load reconstructed PE into Ghidra for static analysis
Resources
- Any.run (free tier)
- Cuckoo Sandbox documentation (free)
- x64dbg documentation (free)
- "Practical Malware Analysis" chapter on dynamic analysis
- OALabs YouTube channel (free)
- hasherezade workshops (free)
Stage 05
Advanced Analysis Topics
Going beyond basic analysis into areas that distinguish senior malware analysts: network protocols, rootkits, scripting, and specialized platforms.
Custom Network Protocol Analysis
- Identifying custom protocols — unusual port usage; non-standard data patterns; regular interval traffic
- Protocol reversal process: 1. Capture traffic in Wireshark during sandbox run 2. Follow TCP/UDP stream — examine raw bytes 3. Identify fields — fixed headers, length fields, type fields, data 4. Correlate with code — find parsing and construction code in disassembly 5. Document structure — define in Wireshark dissector or code comment
- C2 protocol categories: - HTTP-based — blends with legitimate traffic; URL patterns, POST bodies, cookie abuse - HTTPS — encrypted; JTLS interception possible in sandbox - DNS — commands in subdomain queries; responses in A/TXT records - Custom TCP — raw binary protocol; XOR or simple encryption common - Covert channels — ICMP, SMTP, SMB
Rootkit Analysis
- User-mode rootkit techniques: - API hooking — IAT hooking, inline hooking (patching first bytes of function) - DLL injection — loaded into every process via AppInit_DLLs or global hooks - Process hiding — filtering results of NtQuerySystemInformation
- Kernel-mode rootkit techniques: - SSDT hooking — modifying System Service Descriptor Table entries - DKOM — modifying kernel data structures (EPROCESS list, driver list) - IRP hooking — intercepting driver communication - Filter drivers — filesystem/network filter drivers
- Detection approach: - WinDbg cross-view analysis — compare kernel list with other artifacts - Volatility psscan vs pslist — find DKOM-hidden processes - API hooking detection — compare IAT entries against actual module exports
Scripting and Automation
- IDA Python / Ghidra Python: - Automating renaming based on patterns - Extracting and decoding strings automatically - Generating YARA rules from found patterns - Commenting API call patterns
- Python for malware analysis: - pefile — PE format parsing - capstone — disassembly engine; Python bindings - frida — dynamic instrumentation from Python - yara-python — YARA integration - dpkt — PCAP parsing - volatility3 — memory analysis automation
Malware Families Research
- Tracking families — VX Underground, MalwareBazaar, theZoo (GitHub), URLhaus
- Configuration extractors — scripted decryption of embedded C2 configs
- Unpacking scripts — family-specific unpackers for common packers
- Family similarity — code reuse across campaigns; shared crypto implementations
- Attribution challenges — false flags, shared tooling, code sharing
Resources
- "Rootkits and Bootkits" (Matrosov, Rodionov, Bratus) book
- SANS FOR610 Malware Analysis course
- OpenAnalysis Labs (OALabs) YouTube (free)
- hasherezade GitHub (free tools and tutorials)
Stage 06
Threat Intelligence Integration
Malware analysis output feeds threat intelligence. Analysts who can produce actionable intelligence from technical findings have significantly more impact.
IOC Extraction and Formatting
- IOC types from malware analysis: - File hashes — MD5, SHA-1, SHA-256 of malware samples and dropped files - Network indicators — C2 IPs, domains, URLs, User-Agent strings, URI patterns - Host indicators — registry keys, file paths, mutex names, service names, scheduled task names - Behavioral indicators — command-line patterns, parent-child process relationships, API sequences
- STIX (Structured Threat Information eXpression) — format for sharing threat intelligence
- TAXII — transport protocol for sharing STIX data
- MISP — Malware Information Sharing Platform; structured IOC storage and sharing
YARA Rule Development
- Moving beyond basic YARA to robust, low-false-positive rules: - Condition logic — filesize, import counts, section counts - PE module — pe.number_of_sections, pe.imphash(), pe.exports() - Combining string matches with structural features - Testing rules against clean and dirty samples
- Rule distribution — VirusTotal hunting, MISP events, threat intel sharing
Detection Engineering from Analysis
- Sigma rules for SIEM detection from behavioral analysis: - Process creation rules from observed command lines - Registry modification rules from persistence mechanisms - Network detection rules from C2 communication patterns
- EDR rule writing — CrowdStrike custom IOAs, SentinelOne custom rules
- ATT&CK mapping — documenting techniques observed in each malware sample
Reporting
- Malware analysis report structure: - Executive Summary — malware name/family, risk level, key capabilities, IOCs - Technical Analysis — static findings, dynamic behavior, network communication, persistence - Detection and Response — YARA rules, EDR signatures, network signatures, remediation - IOC Appendix — all indicators with confidence levels
- Audience calibration — internal security team needs different detail than executive leadership
- Attribution language — careful hedging; "consistent with," "possibly," "likely"
Resources
- MITRE ATT&CK (free)
- MISP documentation (free)
- Sigma rules GitHub (free)
- Recorded Future community (free blog)
- Mandiant threat intelligence blog (free)
FAQ
Common questions
How long does it take to become a Malware Analyst?
3–4 years optimistic at 20–25 hours/week, 4–5 years realistic. Reverse engineering demands deep operating system internals knowledge (Windows kernel, PE format, memory management), assembly fluency (x86-64 minimum, ARM increasingly), and obsessive practice with sample malware. There's no shortcut — the role is built on accumulated reading hours of disassembled code. Pure self-taught paths exist but typically take longer than security-engineer-to-malware-analyst transitions.
Which certifications matter for malware analysis?
GREM (GIAC Reverse Engineering Malware) is the canonical cert. GCFA for forensics depth. OSEE for advanced exploitation work. SANS courses are expensive but content is genuinely the gold standard. Many roles require security clearance — Fort Meade, Herndon VA, and the DC corridor concentrate government and government-contractor opportunities. Clearance significantly expands the job market.
Do I need a CS or computer engineering degree?
Helpful but not required for corporate roles. Federal and government-contractor roles often require a bachelor's plus clearance. The technical bar is high — assembly literacy, OS internals, and reverse engineering tooling (IDA Pro, Ghidra, x64dbg) — which favors candidates with formal CS exposure but doesn't strictly require it. Self-taught paths through CTF reverse engineering, Hack The Box challenges, and Flare-On competitions produce competitive candidates.
What separates a hired Malware Analyst?
Public reverse engineering writeups. Sample analyses on MalwareBazaar samples, Flare-On challenge solutions, Hack The Box reverse engineering writeups — documented technical work signals capability beyond theoretical knowledge. IDA Pro and Ghidra appear as required or preferred in nearly every reverse engineering posting. Bonus: contributions to Volatility plugins, YARA rule signatures for malware families, and detection engineering ties to threat intel.