Roadmap
Vulnerability Researcher
The specialist who discovers new vulnerabilities in software, hardware, and protocols through source code analysis, binary reverse engineering, fuzzing, and manual testing. Produces CVEs, bug bounty findings, and security improvements in products before attackers can find and weaponize them.
OPTIMISTIC 4–5 years · REALISTIC 5–6 years
Stage 00
Programming and Computer Science Fundamentals
Vulnerability research requires genuine engineering depth. You are finding flaws in software systems, which requires understanding those systems at the implementation level.
C and C++ — Primary Research Languages
- C fundamentals — all data types, pointers, arrays, structs, memory allocation, file I/O
- Memory model — stack vs heap; static allocation; memory layout of a process
- Pointers — pointer arithmetic; pointer to pointer; function pointers; void pointers
- Memory management: dynamic heap allocation via malloc/calloc/realloc/free; stack allocation with scope-tied lifetime; memory errors that create vulnerabilities including buffer overflow, heap overflow, use-after-free, double-free, integer overflow, off-by-one, format string, and null pointer dereference
- C++ additions relevant to security: virtual functions and vtable for control flow hijack; new/delete heap allocation with same UAF/double-free issues; smart pointers (shared_ptr, unique_ptr) safer but not immune; STL containers bounds checking (operator[] vs .at())
- Compilation and linking: preprocessor, compiler, assembler, linker stages; compiler flags relevant to security (-fstack-protector, -D_FORTIFY_SOURCE, -pie, -fPIE); debug vs release optimization differences; optimization can eliminate security checks
Python — Automation and Tooling
- All fundamentals — data structures, OOP, modules, file I/O
- ctypes — calling C functions from Python; interfacing with native code
- struct module — packing and unpacking binary data
- socket — network programming for protocol fuzzing
- Process spawning module — executing programs and capturing output
- Fuzzing automation — writing Python harnesses around native targets
- Exploit helper scripts — payload construction, IPC, exploitation automation
Assembly Language
- Full x86/x64 assembly knowledge required (see Malware Analyst Stage 0)
- ARM assembly — growing importance for embedded, mobile, and IoT research; ARM registers R0–R12 general purpose, R13 (SP), R14 (LR), R15 (PC); ARM calling convention R0–R3 for first four arguments; Thumb/Thumb-2 mixed 16/32-bit instruction set; AARCH64 (ARM64) with X0–X7 arguments and X0 return
Operating System Internals
- Linux kernel architecture: kernel space vs user space; system calls interface; virtual memory pages and page tables; process management fork/exec/wait/signals; file descriptors; kernel modules as attack surface; /proc, /sys, /dev virtual filesystems
- Windows kernel architecture: Windows NT Executive, HAL, Win32 subsystem; kernel objects (handles, named objects, security descriptors); Windows API (Win32, Native ntdll, kernel-mode); driver model (KMDF, UMDF); IRP processing; Windows memory (PFN database, working set); syscall mechanism (sysenter/syscall, SSDT); kernel exploitation targets (win32k.sys, driver bugs, kernel objects)
Resources
- "The C Programming Language" (K&R, book)
- "Computer Systems: A Programmer's Perspective" (Bryant & O'Hallaron, book)
- Linux kernel documentation (free)
- "Windows Internals" by Russinovich (book)
Stage 01
Binary Exploitation Fundamentals
Exploit development requires understanding both what vulnerabilities exist and how to turn them into controlled execution. This is the bridge from finding a bug to proving it is exploitable.
Memory Safety — Deep Understanding
- Stack buffer overflow: stack layout (local variables, saved EBP/RBP, return address, arguments); overwriting return address to redirect execution to attacker code; NOP sled padding; classic vulnerable functions gets/strcpy/sprintf
- Heap exploitation: heap metadata (chunk headers, doubly-linked free lists); ptmalloc/glibc heap bins (fast, small, large, unsorted, tcache); heap overflow; use-after-free; double-free with tcache poisoning; heap spraying
- Format string exploitation: %x to read stack values; %s for arbitrary memory read; %n for arbitrary write; GOT reads/overwrites
Modern Exploitation Mitigations
- Stack Canary (Stack Protector): random value between locals and return address; detected on function return; bypass via leak, canary-avoidance write, or format string read
- ASLR (Address Space Layout Randomization): randomizes base addresses per execution; bypass via information leak, partial overwrite, brute force on 32-bit, or format string; Linux /proc/sys/kernel/randomize_va_space settings 0/1/2
- NX / DEP (Non-Executable Stack / DEP): stack and heap non-executable; bypass via ROP or ret2libc
- PIE (Position Independent Executable): randomizes executable base address; combined with ASLR makes all addresses random; bypass via info leak or partial overwrite
- RELRO (Relocation Read-Only): Full RELRO makes GOT read-only; Partial RELRO has some GOT entries read-only; bypass Full RELRO via alternative code pointers (vtable, function pointers)
- CFI (Control Flow Integrity): restricts indirect jumps/calls to valid targets; shadow stack (CET), forward-edge CFI, LLVM CFI implementations
- SAFE SEH (Structured Exception Handler): Windows validates SEH chain to prevent SEH overwrite
Return-Oriented Programming (ROP)
- Concept: chain together existing code fragments ("gadgets") ending in ret instruction
- Gadget finding: ROPgadget, ropper, pwndbg's rop command
- Common gadget types: pop rdi; ret to load first syscall arg; pop rax; ret for syscall number; syscall; ret to execute; mov [rdi], rsi; ret to write memory; add rsp, N; ret for stack pivot
- ret2libc — calling libc functions via ROP (system, execve, etc.)
- SIGROP — signal return oriented programming; abusing rt_sigreturn syscall
- Stack pivoting — moving RSP to attacker-controlled memory (e.g., heap buffer)
Tools
- pwntools (Python) — CTF exploitation framework: process()/remote() for connecting; p32()/p64()/u64() for packing/unpacking; flat() for ROP chains; ELF() for binary analysis; ROP() for automated chains; gdb.attach() for debugging
- pwndbg / GEF / PEDA — GDB enhancement plugins: checksec for protections; vmmap for memory layout; heap visualization; rop gadget search; telescope for pointer inspection
- GDB — foundation debugger; commands: break, run, continue, next, step, info registers, x/Nx format
- ltrace / strace — tracing library and system calls respectively
- checksec — displaying binary security mitigations
Resources
- "Hacking: The Art of Exploitation" by Jon Erickson (book, essential)
- pwn.college (free, best binary exploitation learning)
- exploit.education (free VMs)
- LiveOverflow YouTube (free)
- CTFtime.org
Stage 02
Security Fundamentals
Vulnerability research is security-domain-specific. Understanding the threat landscape, CVE ecosystem, and disclosure processes is required.
CVE Ecosystem
- CVE (Common Vulnerabilities and Exposures) — unique identifier per vulnerability
- NVD (National Vulnerability Database) — CVSS scores, descriptions, references
- CWE (Common Weakness Enumeration) — vulnerability type taxonomy
- CVSS (Common Vulnerability Scoring System) — standardized severity scoring
- CVE lifecycle: discovery → report to vendor → patch development → CVE assignment → coordinated disclosure → public release
Responsible Disclosure
- Principles: notify vendor before public disclosure; give reasonable time to patch
- Standard timelines: Google Project Zero uses 90 days; many researchers use 90 days; some use 45 or 120
- CERT Coordination Center — intermediary for complex disclosures
- Bug bounty programs — managed disclosure with defined rewards
- Full disclosure — immediate public release; controversial; used when vendors are unresponsive
- CVE numbering authorities — MITRE, CNAs (vendors who self-assign CVEs)
- Writing good vulnerability reports: clear title and summary; affected versions; vulnerability type (CWE); step-by-step reproduction; impact assessment; proof-of-concept code or crash file; suggested fix
Attack Surface Analysis
- Threat modeling for research targeting — where are the most interesting attack surfaces?
- Input attack surface: parsing code, file format handlers, network protocol implementations
- Interface attack surface: APIs, IPC mechanisms, kernel interfaces
- Trust boundary attack surface: privilege transitions, sandboxes, hypervisors
- Target selection strategy: high-impact targets (browsers, OS kernels, hypervisors) harder but CVEs matter more; under-researched targets more likely to find low-hanging fruit; targets you deeply understand where domain expertise accelerates research
Resources
- MITRE CVE (free)
- NVD (free)
- Google Project Zero blog (free, excellent research examples)
- Pwn2Own competition archives (free)
Stage 03
Vulnerability Discovery Techniques
Source Code Auditing
- Manual source code review for vulnerability classes: identify external input points; trace data flows to security-sensitive operations; focus on parsing code, integer arithmetic, memory allocation, format strings, command construction; look for missing bounds checks, unchecked return values, type confusion, integer overflows
- Semgrep and CodeQL for automated pattern matching on large codebases
- Differential analysis / patch diffing: comparing patched vs unpatched versions; tools BinDiff (commercial), diaphora (free IDA plugin), radiff2 (radare2); process: download old and new, diff in disassembler, understand patch, reverse-engineer vulnerability; CVE North Stars methodology
Binary Analysis Without Source
- Static binary analysis (see Malware Analyst path Stage 3)
- Binary-level sink identification: calls to dangerous functions (strcmp, strcpy, sprintf, system, popen, memcpy without bounds check); integer operations feeding allocation sizes; pointer arithmetic without validation
- Recovering data structures: identifying struct layouts from field access patterns; applying RTTI for C++ classes; symbol servers for partially stripped binaries
Fuzzing — Core Research Technique
- What fuzzing is: automated input generation to find crashes in programs
- Types: black-box (random input, minimal feedback); coverage-guided (AFL/libFuzzer using code coverage); grammar-based (valid-but-adversarial inputs); snapshot fuzzing (memory snapshot reset); differential fuzzing (two implementations compared)
- AFL++ compiling targets with instrumentation: AFL_USE_ASAN=1 CC=afl-clang-fast configure && make for ASAN + AFL; afl-clang-fast / afl-clang-lto compiler wrappers; LLVM mode for faster fuzzing with collision-free coverage
- Running AFL++: afl-fuzz -i input_seeds/ -o output/ -t 1000 -- ./target @@; @@ replaced with path to fuzz input file; input seed quality matters; -m none disables memory limit
- Interpreting output: corpus count shows unique paths found; crashes examined with GDB/ASAN; hangs timeout-triggering inputs may indicate DoS
- AFL++ modes: QEMU mode for closed-source binaries; unicorn mode for snippet fuzzing; persistent mode calls target function in loop without re-execution
- Parallel fuzzing — multiple AFL++ instances on multi-core systems
- In-process fuzzing — target function called in loop by fuzzer; no process overhead
- Writing a fuzz target: implementing LLVMFuzzerTestOneInput entry function taking uint8_t data and size_t size, calling target function with fuzz data, returning zero for normal and non-zero for interesting-but-not-a-crash
- Compiling: clang -fsanitize=fuzzer,address target.c -o fuzz_target
- Running: ./fuzz_target corpus/ -max_total_time=3600
- Integration with AFL++ corpus sharing
- The critical skill: writing the code that connects the fuzzer to the target
- Harness components: initialize the target (libraries, state); feed fuzz input to target API; handle errors gracefully (don't crash the fuzzer on expected errors); clean up state between iterations (for persistent mode)
- Challenges: stateful protocols maintaining state across fuzz calls; non-determinism from random seeds, timers, external dependencies; parsing validation skipping to reach deeper code
- Network protocol fuzzing: Boofuzz (Python) framework for stateful session support; custom socket-level fuzzing scripts; TLS targets handling wrapping around protocols
- AddressSanitizer (ASAN) — detects memory errors: heap/stack buffer overflow, UAF, double-free, use-after-return; compile -fsanitize=address; overhead ~2x memory, ~2x CPU
- UndefinedBehaviorSanitizer (UBSan) — detects undefined behavior: integer overflow, null dereference, alignment violations; compile -fsanitize=undefined
- MemorySanitizer (MSan) — detects use of uninitialized memory
- ThreadSanitizer (TSan) — detects race conditions
- Combining sanitizers: ASAN + UBSan is standard fuzzing configuration
Manual Vulnerability Discovery
- Code paths rarely reached by fuzzers: complex authentication flows; multi-step transaction logic; error handling paths; edge cases in parser state machines
- Hypothesis-driven research: "This parser allocates a buffer based on a size field; if the size field can overflow..."; "This function assumes its caller validates the input; what if called directly?"; "This code reuses a buffer; is it fully cleared between uses?"
- CVE reproduction and variant research: read a disclosed CVE; reproduce the vulnerability; understand the root cause pattern; look for the same pattern elsewhere in the codebase or similar products
Resources
- AFL++ documentation (free)
- "The Fuzzing Book" (free online)
- pwn.college fuzzing module (free)
- FuzzCon presentations (free YouTube)
- Google OSS-Fuzz (free, good examples of production fuzz harnesses)
Stage 04
Web and API Vulnerability Research
Web targets are accessible, high-impact, and available through bug bounty programs. Web vulnerability research is the most accessible entry point to the field.
Advanced Web Vulnerability Classes
- Server-side template injection (SSTI): identifying template engines from behavior and error messages; template-specific payloads for Jinja2, Twig, FreeMarker, Velocity, Smarty; escalation from information disclosure to RCE
- Deserialization attacks: Java gadget chains via ysoserial (CommonsCollections, Spring, JDK); PHP object injection via magic methods; Python serialization gadget chains; .NET deserialization via ysoserial.net
- OAuth/OIDC attacks: account takeover via state parameter CSRF; token theft via open redirect in redirect_uri; bypassing proof-of-possession (DPoP) incorrectly implemented
- Prototype pollution in JavaScript: object merge patterns; escalation to XSS or RCE via property injection
- HTTP request smuggling: CL.TE (Content-Length interpreted by front-end, Transfer-Encoding by back-end); TE.CL (reverse); HTTP/2 downgrade smuggling
- Cache poisoning — injecting malicious content into shared caches via unkeyed headers
- Host header attacks — password reset poisoning, SSRF via Host header
Bug Bounty Research Methodology
- Target selection: programs with larger attack surface (main product, not just marketing site); programs with known technology stack (prefer familiar tech); programs with good payouts relative to scope
- Reconnaissance: asset discovery via subfinder, amass, crt.sh; technology fingerprinting with Wappalyzer, WhatWeb, HTTP headers; JavaScript analysis with linkfinder, secretfinder, JSParser; historical data via Wayback Machine
- Systematic coverage: map all endpoints and parameters; test authentication and authorization on every endpoint; focus on unique functionality with less prior testing
- High-value targets within programs: authentication flows (password reset, login, account recovery); payment and billing; admin functionality; file upload; import/export features
Resources
- PortSwigger Web Security Academy (free)
- HackerOne Hacktivity (free, real reports)
- Jason Haddix's methodology posts (free)
- STÖK's YouTube (free)
Stage 05
Reverse Engineering for Vulnerability Research
Binary targets require reverse engineering before fuzzing or manual analysis. Deep RE skills from the Malware Analyst path apply here with a research focus.
From Malware Analyst Path
- Complete Ghidra and IDA Pro proficiency (Stage 3 from Malware Analyst path)
- x86/x64 assembly fluency (Stage 0 from Malware Analyst path)
- PE/ELF format knowledge (Stage 1 from Malware Analyst path)
- Dynamic analysis and debugging (Stage 4 from Malware Analyst path)
RE for Vulnerability Research Specifics
- Identifying vulnerability-prone code patterns in disassembly: size calculations with multiplication before malloc; memcpy/strcpy without explicit bounds check; loop conditions with off-by-one potential; pointer arithmetic without validation; format string calls printf(user_data) pattern
- Protocol reverse engineering for fuzzer development: identifying message framing (length fields, delimiters); mapping message types and handlers; finding state machine transitions
- Vulnerability variant hunting: understanding a patched vulnerability and finding the unfixed version; finding sibling functions with the same pattern; finding related products with ported vulnerable code
Automated Binary Analysis
- Binary ninja automation: scripting vulnerability pattern detection; building analysis plugins
- Angr (symbolic execution): exploring all execution paths symbolically; proving or disproving reachability of dangerous conditions; constraint solving for input generation
- KLEE — symbolic execution on LLVM bitcode
Stage 06
Vulnerability Reporting and Disclosure
A found vulnerability has zero impact without a clear report that enables the vendor to understand and fix it.
Vulnerability Report Structure
- Executive summary — severity, affected product/version, one-sentence description
- Technical description — precise characterization of the vulnerability class and mechanism
- Root cause analysis — the exact code path or design flaw
- Reproduction steps — step-by-step from clean state to triggering the vulnerability
- Proof-of-concept — crash file, exploit script, or reproduction case
- Impact assessment — what an attacker can achieve; confidentiality/integrity/availability
- Affected versions — specific version numbers; when was the issue introduced?
- Suggested remediation — what the vendor should fix and how
- CVSS score — base score calculation with justification
Disclosure Coordination
- Initial contact — use vendor security email or bug bounty platform; encrypted if possible
- Include enough to demonstrate validity; do not include full exploit
- Timeline negotiation — agree on disclosure date; typical 90 days
- Tracking — keep records of all communications with dates
- CVE assignment — vendor or CNA typically assigns; researcher can request via MITRE
- Publication — security advisory, blog post, conference presentation
Conference Presentation
- DEF CON, Black Hat, CCC, OffensiveCon — primary vulnerability research venues
- Format: technical walkthrough of discovery process + vulnerability detail + demo
- Paper writing for academic venues: IEEE S&P, USENIX Security, CCS, NDSS
Resources
- Coordinated Vulnerability Disclosure guidelines (CISA, free)
- Google Project Zero disclosure policy (free)
- CVE assignment process (MITRE, free)
Stage 07
Hands-On Practice & Portfolio
Practice Platforms and Targets
- pwn.college (free) — structured binary exploitation curriculum; kernel, heap, stack, ROP
- exploit.education (free) — VMs for exploitation practice
- CTFtime.org — PWN and RE categories in CTF competitions
- HackTheBox (free/paid) — machines and challenges
- picoCTF — annual CTF with binary exploitation challenges (free)
- VulnHub — vulnerable VMs for exploitation practice (free)
- Bug bounty: HackerOne public programs with defined bug bounties; Google Vulnerability Reward Program (VRP) covers Chrome, Android, GCP; Microsoft Bug Bounty covers Windows, Edge, Azure
Building a Portfolio
- CVE disclosures — finding and responsibly disclosing a real vulnerability is the highest-signal portfolio item; even a low-severity CVE in a minor project demonstrates the full research workflow
- CTF writeups — detailed technical explanations of how exploitation challenges were solved; published on a blog
- Original research blog posts — documenting a research area, a fuzzing campaign, or a class of vulnerabilities
- Public fuzzing harnesses — contributing fuzz targets to OSS-Fuzz or publishing standalone harnesses on GitHub
- Conference talks — presenting at DEF CON, BSides, or local security meetups
- Bug bounty hall of fame appearances — publicly credited findings on programs
OSS-Fuzz Contributions
- Contributing fuzz targets to Google's OSS-Fuzz is a high-signal portfolio action: pick an open-source project in C/C++ that handles untrusted input; write a libFuzzer target; submit to OSS-Fuzz via GitHub PR; finding a bug in an OSS project this way generates a CVE
What to Document on LabList
- CVE research write-ups — full disclosure timeline and technical details
- CTF exploitation writeups — binary exploitation, RE challenges with methodology explained
- Fuzzing campaigns — documented harness development, corpus management, crash triage
- Bug bounty reports — sanitized versions of submitted findings
- Research blog — explaining a vulnerability class, a fuzzer technique, or a target analysis
FAQ
Common questions
How long does it take to become a Vulnerability Researcher?
4–5 years optimistic at 20–25 hours/week, 5–6 years realistic. VR is one of the longest paths in cybersecurity because it demands deep operating systems internals, assembly fluency, fuzzing methodology, and CVE-quality writeups. The fastest paths come from reverse engineering, AppSec, or systems programming backgrounds. Pure self-taught paths exist but typically take longer than security-engineer-to-VR transitions.
Which certifications matter for VR roles?
OSEE (Offensive Security Exploitation Expert) for advanced exploitation work. OSED for exploit development depth. GREM for malware analysis overlap. SANS courses (SEC660, SEC760) are gold standard but expensive. CVE discovery is the primary portfolio signal — certs matter less than published vulnerabilities.
Do I need a CS degree?
Helpful but not strictly required. Federal and clearance-required roles often require a bachelor's plus security clearance. Security clearance adds $65K+ in many government-adjacent roles. Self-taught paths through CTF reverse engineering, public CVE research, and bug bounty progression produce competitive candidates. The technical bar is genuinely high — assembly, fuzzing, exploit development — favoring formal CS exposure but not requiring it.
What separates a hired Vulnerability Researcher?
Published CVEs with documented research methodology. Sample fuzz harnesses on GitHub, conference talks (BSides, Defcon Village), and bug bounty disclosure history demonstrate capability. Fuzzing expertise (AFL++, libFuzzer, custom harness development) is the fastest-growing technical skill. Other differentiators: exploit primitive knowledge, mitigation bypass familiarity (ASLR, DEP, CFI), and at least one significant publicly-disclosed vulnerability in your name.