Recon & Content Discovery
Map the attack surface: subdomains, directories, parameters, and hidden endpoints with ffuf, gobuster, and more.
An attacker who jumps straight to exploitation without reconnaissance is like a burglar who tries to open random doors on a random street. Effective web application security testing starts with one question: what is actually here? The attack surface of a modern web application is far larger than what you see in a browser - it spans subdomains you have never visited, directories with no links pointing to them, parameters that only appear in JavaScript bundles, and API endpoints documented nowhere but the source code. This lesson teaches you how to find all of it.
Authorized Targets Only - Every Lesson, Every Time
Reconnaissance and enumeration techniques - including subdomain enumeration, content discovery, and parameter mining - generate significant traffic against targets. Running these against any host you do not own or lack explicit written permission to test is illegal and can be logged as an attack. Stick to bug bounty programs, CTF infrastructure, and deliberate lab environments.
Thinking in Attack Surface
The attack surface of a web application is every point where attacker-controlled input enters the system. Before you can test it, you have to map it. A systematic recon phase serves two purposes:
- Breadth - finding targets and entry points you would otherwise miss
- Context - understanding the technology stack, patterns, and likely vulnerability classes before you start probing
A well-executed recon phase often finds more valid bugs than automated scanners. Automated tools only test what they find; recon decides what they find.
Passive vs Active Recon
Passive recon collects information without touching the target. You query third-party sources, search engines, certificate transparency logs, and archives. The target never sees your IP.
Active recon makes requests directly to the target. Subdomain brute-force, directory enumeration, port scanning - all of these generate traffic the target can log.
Know Your Scope Before Going Active
Bug bounty programs specify what is in scope. *.example.com usually means any subdomain. app.example.com means only that host. Going active against out-of-scope hosts can get you banned and, in some programs, reported to authorities. Read the scope before running anything.
Step 1: Subdomain Enumeration
Modern companies run dozens or hundreds of subdomains - staging environments, internal tools, old forgotten APIs, partner portals. Each is a potential entry point.
Passive: Certificate Transparency Logs
Every TLS certificate issued by a public CA is logged in Certificate Transparency (CT) logs. Tools query these logs to enumerate subdomains without touching the target:
CT logs are extremely powerful - they often reveal subdomains that have never been linked from any public page.
Active: DNS Brute-Force
Active brute-force tries combinations from a wordlist against the target's DNS:
Other Passive Sources
What to Do with Subdomains You Find
Once you have a list of live subdomains, check each one:
- What technology is it running? (
whatweb, Wappalyzer, response headers) - Is it a login panel? An API endpoint? A staging environment with debug features enabled?
- Does it have a wildcard DNS entry pointing to an unclaimed third-party service? That is subdomain takeover territory.
Subdomain Takeover
A subdomain takeover occurs when a DNS CNAME record points to a service (Heroku, S3, GitHub Pages) that has been deleted. The attacker can claim that service and serve content at the subdomain, which is trusted by the parent domain's users and often shares cookies. Nuclei has templates specifically for detecting takeover candidates.
Step 2: Content Discovery
Once you have a list of live hosts, find the paths and files on each one that are not linked from the visible UI. These are directories, backup files, admin panels, API versioning paths, and configuration files.
ffuf: The Preferred Tool
ffuf (Fuzz Faster U Fool) is fast, flexible, and well-maintained. The FUZZ keyword marks where the wordlist values are inserted:
gobuster: Reliable Alternative
dirsearch: Python-Based, Good for PHP Apps
Wordlists That Matter
The quality of your wordlist determines what you find. The SecLists repository is the standard:
Technology-Specific Wordlists
SecLists includes technology-specific lists under Discovery/Web-Content/CMS/ for WordPress, Drupal, Joomla, etc. If you identify the CMS, use the matching wordlist - it will find many more paths than a generic list.
Step 3: Checking robots.txt and sitemap.xml
These files are meant to guide search engine crawlers. Developers sometimes use robots.txt to tell search engines what not to index - which is often exactly what an attacker wants to look at:
sitemap.xml gives you a complete list of URLs the site owner considers part of the application - often including API endpoints, download paths, and language variants that your fuzzer would not have found.
Step 4: JavaScript Endpoint Mining
Modern single-page applications (SPAs) built on React, Vue, or Angular load almost all of their application logic as JavaScript bundles. Those bundles contain every API endpoint the frontend ever calls, sometimes including undocumented or internal-only routes.
Manual Approach with Browser DevTools
Open DevTools (F12), go to the Sources/Debugger tab, and look for JavaScript bundle files (usually named like main.abc123.js or chunk.vendors.xyz.js). Search within those files for patterns like /api/, fetch(, axios., endpoint, baseURL.
Automated: LinkFinder
Automated: getJS + relative-url-extractor
What You Are Looking For in JavaScript
- API base URLs:
https://api.example.com,https://internal.example.com/v2 - Hardcoded API keys or tokens (leak A02 - Cryptographic Failures)
- Undocumented endpoint paths:
/api/admin/users,/v2/internal/debug - GraphQL endpoints: typically at
/graphqlor/api/graphql - Feature flags or disabled features that are still accessible
Step 5: Parameter Discovery
Knowing a URL exists is not enough - you also need to know what parameters it accepts. Parameters that developers never documented can still exist and be vulnerable.
Arjun: HTTP Parameter Discovery
Manual Parameter Discovery with ffuf
Step 6: Pulling It Together - A Recon Pipeline
Here is a structured recon pipeline for a bug bounty target or pentest engagement:
Key Takeaways
- Recon determines the quality of your testing. Finding the hidden subdomain with debug mode enabled is often worth more than testing the obvious login form.
- Passive recon first (CT logs, Wayback, theHarvester) gives you breadth without alerting the target.
- ffuf with SecLists is the standard content discovery workflow. Always run with multiple extensions and filter by status code.
- robots.txt and sitemap.xml are frequently overlooked but contain direct pointers to sensitive paths.
- JavaScript bundles contain the complete API surface of modern SPAs - mine them with LinkFinder before any active fuzzing.
- Subdomains found via CT logs often include forgotten staging environments with weaker security than production.