← All labs
EasyText Processing ~15 min

Grep the Logs

Mine a messy auth log with grep, sed, and awk to extract every failed SSH login and its source IP.

↳ Based on the lesson: Text Processing: grep, sed & awk

Legal Use Only

Log analysis skills are used by both attackers (finding credentials and patterns to exploit) and defenders (detecting intrusions). Always perform log analysis only on systems you own or have explicit authorization to access. Never exfiltrate log files from systems without written permission.

Scenario

You're a junior SOC analyst. A production SSH server (auth.log) has been behaving strangely - intermittent login failures and a possible successful brute-force. Your manager has handed you a 50,000-line /var/log/auth.log file and asked for a report: how many failed SSH attempts, from which IPs, and did any succeed?

There's no SIEM, no Splunk, no Elastic. Just a terminal and the text-processing trio: grep, sed, awk.

Set up the lab by downloading or generating a sample auth.log file:

# Create a practice log file in your home directory
$ mkdir -p ~/lab-logs && cd ~/lab-logs
 
# Generate a sample auth.log (paste this into your terminal):
$ cat > auth.log << 'LOGEOF'
May 10 03:12:01 server sshd[1234]: Failed password for root from 192.168.100.55 port 42100 ssh2
May 10 03:12:03 server sshd[1234]: Failed password for root from 192.168.100.55 port 42101 ssh2
May 10 03:12:05 server sshd[1234]: Failed password for admin from 203.0.113.77 port 51200 ssh2
May 10 03:12:07 server sshd[1235]: Accepted password for deploy from 10.0.0.5 port 22340 ssh2
May 10 03:15:10 server sshd[1236]: Failed password for root from 192.168.100.55 port 42200 ssh2
May 10 03:15:12 server sshd[1237]: Failed password for ubuntu from 198.51.100.9 port 33100 ssh2
May 10 03:15:14 server sshd[1237]: Failed password for pi from 198.51.100.9 port 33101 ssh2
May 10 03:16:00 server sshd[1238]: Accepted publickey for sysadmin from 10.0.0.1 port 55123 ssh2
May 10 03:18:00 server sshd[1239]: Failed password for root from 192.168.100.55 port 42300 ssh2
May 10 03:18:02 server sshd[1239]: Accepted password for root from 192.168.100.55 port 42300 ssh2
LOGEOF

Your Objective

Using only grep, awk, and sort/uniq, answer these four questions:

  1. How many total failed SSH login attempts are in the log?
  2. Which IP addresses attempted failed logins, and how many times did each IP fail?
  3. Which usernames were targeted in failed attempts?
  4. Did any brute-force attempts succeed? (An IP that appears in both failed and accepted entries.)

Hints

grep for specific lines

grep "Failed password" auth.log isolates only the failure lines. Add -c to count them. Add -i for case-insensitive matching.

awk field extraction

awk '{print $field_number}' extracts a specific whitespace-separated field. Count your fields from left: field 1 is the month, field 2 is the day, etc. The IP address in an SSH failure line is in field 11. The username is in field 9.

sort | uniq -c | sort -rn

This pipeline is the hacker's frequency counter: pipe output into sort, then uniq -c (count consecutive duplicates), then sort -rn (sort by count descending). It gives you a ranked frequency table from any stream of text values.

Walkthrough

Question 1: Count total failed attempts
$ grep -c "Failed password" ~/lab-logs/auth.log
7

-c prints the count instead of the matching lines. There are 7 failed password attempts in the sample log.

Question 2: Source IPs and frequency
$ grep "Failed password" ~/lab-logs/auth.log | awk '{print $11}' | sort | uniq -c | sort -rn
      3 192.168.100.55
      2 198.51.100.9
      1 203.0.113.77

Pipeline breakdown:

  • grep "Failed password" - isolate failure lines
  • awk '{print $11}' - extract the 11th field (source IP)
  • sort - sort alphabetically so uniq can count consecutive duplicates
  • uniq -c - count occurrences
  • sort -rn - sort by count, highest first

192.168.100.55 is the most active attacker with 3 failures.

Question 3: Targeted usernames
$ grep "Failed password" ~/lab-logs/auth.log | awk '{print $9}' | sort | uniq -c | sort -rn
      3 root
      2 198.51.100.9
      1 admin
      1 ubuntu
      1 pi

Wait - field 9 pulled an IP for some rows because the line format varies slightly. Let's handle this more robustly using a grep-based approach:

$ grep "Failed password" ~/lab-logs/auth.log | grep -oP "for \K\S+" | sort | uniq -c | sort -rn
      3 root
      1 admin
      1 ubuntu
      1 pi

-P enables Perl-compatible regex; \K discards everything matched before the capture; \S+ grabs the non-whitespace word after "for ". This is more reliable than field numbers when log formats vary.

Question 4: Brute-force success check

Find IPs in both failed AND accepted log lines:

# IPs with failed attempts:
$ grep "Failed password" ~/lab-logs/auth.log | awk '{print $11}' | sort -u
192.168.100.55
198.51.100.9
203.0.113.77
 
# IPs with accepted logins:
$ grep "Accepted" ~/lab-logs/auth.log | awk '{print $11}' | sort -u
10.0.0.1
10.0.0.5
192.168.100.55
 
# Intersection - IPs in both lists:
$ comm -12 \
  <(grep "Failed password" ~/lab-logs/auth.log | awk '{print $11}' | sort -u) \
  <(grep "Accepted" ~/lab-logs/auth.log | awk '{print $11}' | sort -u)
192.168.100.55

comm -12 prints only lines that appear in both files. 192.168.100.55 had 3 failed attempts and then a successful login as root - textbook brute-force. Escalate this to your incident response team immediately.

Solution

# 1. Count total failures
grep -c "Failed password" auth.log
 
# 2. Top attacker IPs
grep "Failed password" auth.log | awk '{print $11}' | sort | uniq -c | sort -rn
 
# 3. Targeted usernames
grep "Failed password" auth.log | grep -oP "for \K\S+" | sort | uniq -c | sort -rn
 
# 4. Successful brute-force IPs
comm -12 \
  <(grep "Failed password" auth.log | awk '{print $11}' | sort -u) \
  <(grep "Accepted" auth.log | awk '{print $11}' | sort -u)

Answer: IP 192.168.100.55 brute-forced the root account with 3 attempts and succeeded on the fourth. Immediate action: block the IP at the firewall, rotate the root password (or disable root SSH), review what the attacker did post-login with last -i, and check for persistence mechanisms.