Web Application Security

Recon & Content Discovery

Map the attack surface: subdomains, directories, parameters, and hidden endpoints with ffuf, gobuster, and more.

Medium 22 minreconffufenumeration

An attacker who jumps straight to exploitation without reconnaissance is like a burglar who tries to open random doors on a random street. Effective web application security testing starts with one question: what is actually here? The attack surface of a modern web application is far larger than what you see in a browser - it spans subdomains you have never visited, directories with no links pointing to them, parameters that only appear in JavaScript bundles, and API endpoints documented nowhere but the source code. This lesson teaches you how to find all of it.

Authorized Targets Only - Every Lesson, Every Time

Reconnaissance and enumeration techniques - including subdomain enumeration, content discovery, and parameter mining - generate significant traffic against targets. Running these against any host you do not own or lack explicit written permission to test is illegal and can be logged as an attack. Stick to bug bounty programs, CTF infrastructure, and deliberate lab environments.

Thinking in Attack Surface

The attack surface of a web application is every point where attacker-controlled input enters the system. Before you can test it, you have to map it. A systematic recon phase serves two purposes:

  1. Breadth - finding targets and entry points you would otherwise miss
  2. Context - understanding the technology stack, patterns, and likely vulnerability classes before you start probing

A well-executed recon phase often finds more valid bugs than automated scanners. Automated tools only test what they find; recon decides what they find.

Passive vs Active Recon

Passive recon collects information without touching the target. You query third-party sources, search engines, certificate transparency logs, and archives. The target never sees your IP.

Active recon makes requests directly to the target. Subdomain brute-force, directory enumeration, port scanning - all of these generate traffic the target can log.

Know Your Scope Before Going Active

Bug bounty programs specify what is in scope. *.example.com usually means any subdomain. app.example.com means only that host. Going active against out-of-scope hosts can get you banned and, in some programs, reported to authorities. Read the scope before running anything.

Step 1: Subdomain Enumeration

Modern companies run dozens or hundreds of subdomains - staging environments, internal tools, old forgotten APIs, partner portals. Each is a potential entry point.

Passive: Certificate Transparency Logs

Every TLS certificate issued by a public CA is logged in Certificate Transparency (CT) logs. Tools query these logs to enumerate subdomains without touching the target:

kali@vr4cs: ~
 

CT logs are extremely powerful - they often reveal subdomains that have never been linked from any public page.

Active: DNS Brute-Force

Active brute-force tries combinations from a wordlist against the target's DNS:

kali@vr4cs: ~
 

Other Passive Sources

kali@vr4cs: ~
 

What to Do with Subdomains You Find

Once you have a list of live subdomains, check each one:

  • What technology is it running? (whatweb, Wappalyzer, response headers)
  • Is it a login panel? An API endpoint? A staging environment with debug features enabled?
  • Does it have a wildcard DNS entry pointing to an unclaimed third-party service? That is subdomain takeover territory.

Subdomain Takeover

A subdomain takeover occurs when a DNS CNAME record points to a service (Heroku, S3, GitHub Pages) that has been deleted. The attacker can claim that service and serve content at the subdomain, which is trusted by the parent domain's users and often shares cookies. Nuclei has templates specifically for detecting takeover candidates.

Step 2: Content Discovery

Once you have a list of live hosts, find the paths and files on each one that are not linked from the visible UI. These are directories, backup files, admin panels, API versioning paths, and configuration files.

ffuf: The Preferred Tool

ffuf (Fuzz Faster U Fool) is fast, flexible, and well-maintained. The FUZZ keyword marks where the wordlist values are inserted:

kali@vr4cs: ~
 

gobuster: Reliable Alternative

kali@vr4cs: ~
 

dirsearch: Python-Based, Good for PHP Apps

kali@vr4cs: ~
 

Wordlists That Matter

The quality of your wordlist determines what you find. The SecLists repository is the standard:

kali@vr4cs: ~
 

Technology-Specific Wordlists

SecLists includes technology-specific lists under Discovery/Web-Content/CMS/ for WordPress, Drupal, Joomla, etc. If you identify the CMS, use the matching wordlist - it will find many more paths than a generic list.

Step 3: Checking robots.txt and sitemap.xml

These files are meant to guide search engine crawlers. Developers sometimes use robots.txt to tell search engines what not to index - which is often exactly what an attacker wants to look at:

kali@vr4cs: ~
 

sitemap.xml gives you a complete list of URLs the site owner considers part of the application - often including API endpoints, download paths, and language variants that your fuzzer would not have found.

Step 4: JavaScript Endpoint Mining

Modern single-page applications (SPAs) built on React, Vue, or Angular load almost all of their application logic as JavaScript bundles. Those bundles contain every API endpoint the frontend ever calls, sometimes including undocumented or internal-only routes.

Manual Approach with Browser DevTools

Open DevTools (F12), go to the Sources/Debugger tab, and look for JavaScript bundle files (usually named like main.abc123.js or chunk.vendors.xyz.js). Search within those files for patterns like /api/, fetch(, axios., endpoint, baseURL.

Automated: LinkFinder

kali@vr4cs: ~
 

Automated: getJS + relative-url-extractor

kali@vr4cs: ~
 

What You Are Looking For in JavaScript

  • API base URLs: https://api.example.com, https://internal.example.com/v2
  • Hardcoded API keys or tokens (leak A02 - Cryptographic Failures)
  • Undocumented endpoint paths: /api/admin/users, /v2/internal/debug
  • GraphQL endpoints: typically at /graphql or /api/graphql
  • Feature flags or disabled features that are still accessible

Step 5: Parameter Discovery

Knowing a URL exists is not enough - you also need to know what parameters it accepts. Parameters that developers never documented can still exist and be vulnerable.

Arjun: HTTP Parameter Discovery

kali@vr4cs: ~
 

Manual Parameter Discovery with ffuf

kali@vr4cs: ~
 

Step 6: Pulling It Together - A Recon Pipeline

Here is a structured recon pipeline for a bug bounty target or pentest engagement:

kali@vr4cs: ~
 

Key Takeaways

  • Recon determines the quality of your testing. Finding the hidden subdomain with debug mode enabled is often worth more than testing the obvious login form.
  • Passive recon first (CT logs, Wayback, theHarvester) gives you breadth without alerting the target.
  • ffuf with SecLists is the standard content discovery workflow. Always run with multiple extensions and filter by status code.
  • robots.txt and sitemap.xml are frequently overlooked but contain direct pointers to sensitive paths.
  • JavaScript bundles contain the complete API surface of modern SPAs - mine them with LinkFinder before any active fuzzing.
  • Subdomains found via CT logs often include forgotten staging environments with weaker security than production.