Reconnaissance & OSINT

Passive and active recon - whois, DNS, certificate transparency, Google dorking, and footprinting a target.

Medium 24 minreconosintfootprinting

Before you send a single packet to a target, you can learn an enormous amount about them - and none of it requires touching their systems. This phase is called reconnaissance, or just recon. Done well, recon turns a black box into a detailed map of the target's attack surface before the client ever sees a scan in their logs.

Active Recon Requires Authorization

Passive recon using public data sources is generally legal. The moment you send traffic directly to the target's infrastructure - DNS queries to their nameservers, HTTP requests to their web servers, any direct probe - you are performing active recon. This requires authorization. In this lesson, passive techniques are generally safe to study; active techniques must only be applied within an authorized engagement.

Passive vs Active Reconnaissance

The distinction is simple but critical:

Passive reconnaissance uses information that is already publicly available. You query third-party services (registrars, public DNS, search engines, certificate logs) that have already indexed the target. Your traffic goes to those third parties, not to the target. The target cannot detect passive recon through their own logs.

Active reconnaissance involves sending traffic directly to the target's systems. Any DNS query to their authoritative nameservers, any HTTP request to their web servers, any TCP packet to their hosts - this generates log entries on their side. Active recon is possible only under an authorized engagement.

In practice, a real pentest uses both in sequence: passive recon first to build a target list, then controlled active recon once the engagement is live.

WHOIS: Who Owns This?

WHOIS is a query protocol for domain and IP registration data. It tells you who registered a domain, when, with what registrar, and often contact information for the owner.

kali@vr4cs: ~

What this gives you:

Registrant details - organization name, sometimes address and phone. Often redacted via privacy services (WHOIS privacy). Even redacted registrations tell you which privacy service is in use.
Creation date - older domains are more established; brand-new domains registering similar names might be phishing infrastructure.
Registrar - useful context for social engineering research.
Name servers - this tells you where DNS is hosted. NS1.CLOUDFLARE.COM means the target uses Cloudflare - which has security implications (DDoS protection, proxied IPs that hide origin servers).
Expiry date - expired or soon-to-expire domains are opportunities for domain takeover.

For IP blocks, WHOIS gives you ASN (Autonomous System Number) and CIDR block ownership:

kali@vr4cs: ~

This tells you which organization owns a block of IPs. Useful for understanding the infrastructure landscape around a target.

DNS Reconnaissance

DNS is a goldmine of target information. The DNS namespace for a domain reveals subdomains, mail servers, name servers, and sometimes internal infrastructure exposed to the internet.

Basic DNS Lookups

kali@vr4cs: ~

TXT records are particularly revealing:

SPF records list authorized mail senders - often revealing cloud services in use (Google Workspace, Mailgun, SendGrid, Office 365)
Domain verification tokens reveal which platforms the organization uses (Google Search Console, Microsoft services, various SaaS)
DKIM selectors can confirm email service providers

Zone Transfer Attempts

A DNS zone transfer (AXFR) is a mechanism for DNS servers to replicate their zone data to secondary servers. If misconfigured, it dumps the entire DNS zone to anyone who asks - every subdomain, every internal record.

kali@vr4cs: ~

Most production DNS servers reject zone transfers from unauthorized clients. When one succeeds, it's a critical finding on its own - and the data it reveals can be enormous.

Subdomain Enumeration

Even without a zone transfer, you can enumerate subdomains through various techniques:

# Using dnsrecon for brute-force subdomain enumeration
$ dnsrecon -d example-target.com -D /usr/share/wordlists/subdomains-top1million.txt -t brt
 
# dnsx for fast resolution of a subdomain wordlist
$ cat subdomains.txt | dnsx -silent
 
# subfinder - passive subdomain discovery aggregating many sources
$ subfinder -d example-target.com -o subdomains.txt

Subdomain enumeration discovers developer portals, staging environments, old APIs, admin panels, and internal tools accidentally exposed to the internet. These are prime targets for the scanning phase.

Certificate Transparency: crt.sh

Every TLS certificate issued by a trusted Certificate Authority must be logged to publicly accessible Certificate Transparency (CT) logs. This is a security mechanism designed to detect misissued certificates - but it also creates a complete public record of every subdomain a target has ever gotten a certificate for.

crt.sh is a web interface and API for querying CT logs.

kali@vr4cs: ~

Notice *.internal.example-target.com - a wildcard certificate for an internal subdomain accidentally included in a public CT log. This reveals the existence of internal infrastructure even if it's not publicly routable.

CT logs never forget. Certificate data from domains that have been decommissioned stays in the logs. Old subdomains pointing to decommissioned infrastructure can be candidates for subdomain takeover.

Google Dorking

Google's advanced search operators can surface information about a target that the target never intended to be indexed. This is sometimes called "Google Dorking" or "Google Hacking."

Core Operators

Operator	Purpose	Example
`site:`	Restrict to a domain	`site:example-target.com`
`filetype:`	Find specific file types	`filetype:pdf site:example-target.com`
`intitle:`	Match in page title	`intitle:"index of" site:example-target.com`
`inurl:`	Match in URL	`inurl:admin site:example-target.com`
`intext:`	Match in page body	`intext:"confidential" site:example-target.com`
`-`	Exclude results	`site:example-target.com -www`

Practical Dorks

# Find login panels
site:example-target.com inurl:login OR inurl:signin OR inurl:admin
 
# Find exposed configuration files
site:example-target.com filetype:xml OR filetype:conf OR filetype:env
 
# Find directory listings
intitle:"index of" site:example-target.com
 
# Find exposed documents
site:example-target.com filetype:pdf OR filetype:xlsx OR filetype:docx
 
# Find subdomains not explicitly stated
site:*.example-target.com -www
 
# Look for error messages with version info
site:example-target.com intext:"SQL syntax" OR intext:"ORA-" OR intext:"stack trace"

The Google Hacking Database (GHDB) at exploit-db.com maintains a database of effective dorks for finding specific vulnerability classes - exposed cameras, login portals, database dumps, sensitive files. It's a reference library for this technique.

Shodan, Censys, and FOFA

Search engines like Shodan, Censys, and FOFA continuously scan the entire public internet and index banners, certificates, and service information. You can query them for an organization's IP ranges and immediately see exposed services, versions, and misconfigurations - all without sending a single packet to the target. These are among the most powerful passive recon tools available.

theHarvester: Aggregated OSINT

theHarvester is a tool that aggregates results from multiple OSINT sources into a unified list of emails, hosts, IPs, and URLs for a target domain.

kali@vr4cs: ~

The emails are valuable for two reasons: they reveal real employee accounts (useful for phishing if in scope, and for understanding naming conventions), and they confirm the email format the organization uses (first.last@ vs flast@) - useful for guessing other valid accounts.

Footprinting an Organization

Putting it together: footprinting is the process of aggregating all OSINT sources into a coherent picture of the organization's attack surface. A thorough footprint captures:

Technical:

IP ranges owned (WHOIS, ARIN/RIPE/APNIC registries)
Domains and subdomains (WHOIS, CT logs, DNS enum)
Technologies in use (HTTP headers, job postings, Wappalyzer)
Cloud providers (SPF records, IP geolocation, DNS CNAMEs to cloud services)
Email providers (MX records, SPF)
Exposed services (Shodan/Censys)
SSL/TLS certificate details (cipher suites, expiry, SANs)

Organizational:

Employee names and emails (LinkedIn, theHarvester, company website)
Org chart / reporting structure (LinkedIn)
Technologies mentioned in job postings (reveals internal stack)
Recent acquisitions (expanded attack surface not yet integrated)
Office locations (for physical engagement scoping)

Job Postings Are Intel

A job posting that says "We are looking for a Senior DevOps Engineer experienced with AWS, Kubernetes, Terraform, and Vault (HashiCorp)" has just told you the target's infrastructure stack. This is passive recon gold. Archived job postings on Wayback Machine can reveal technologies the company used to run and may still have legacy instances of.

Building a Target List for Scanning

Recon ends with a structured list that feeds directly into scanning:

# Example target list output from recon phase
IP Ranges:
  93.184.216.0/24  (primary internet presence)
  198.51.100.0/27  (secondary hosting)
 
Live Hostnames:
  www.example-target.com        → 93.184.216.34
  api.example-target.com        → 93.184.216.50
  vpn.example-target.com        → 93.184.216.70
  staging.example-target.com    → 93.184.216.80
  legacy.example-target.com     → 93.184.216.99
 
Mail:
  MX → mail.example-target.com (Google Workspace / O365?)
  SPF confirms Google Workspace
 
Technologies (from HTTP headers, crt.sh, Shodan):
  Nginx/1.18 on api host
  Node.js (X-Powered-By header on www)
  Cloudflare WAF on www
 
Employees/Emails:
  j.smith@example-target.com (LinkedIn: John Smith, SysAdmin)
  a.jones@example-target.com (LinkedIn: Alice Jones, Lead Developer)

This document is the bridge between recon and scanning. Everything in it was gathered without sending a packet to the target.

Key Takeaways

Passive recon uses public data sources and is generally invisible to the target. Active recon sends traffic to the target directly and requires authorization.
WHOIS reveals domain ownership, registrar, nameservers, and IP block ownership.
DNS records (A, MX, NS, TXT, AXFR) reveal infrastructure, services, and cloud providers.
Certificate Transparency logs (crt.sh) expose every subdomain ever issued a TLS certificate - including decommissioned and internal ones.
Google dorking surfaces exposed files, admin panels, and error messages indexed by search engines.
theHarvester aggregates multiple OSINT sources into unified host and email lists.
Recon ends with a structured target list that feeds directly into the scanning phase.

The Pentest Methodology

Scanning & Enumeration