CIS120 Linux Fundamentals by Scott Shaper

Regular Expressions

Think of regular expressions like a powerful search language that lets you describe patterns instead of exact matches. It's similar to how you might describe a person to someone: "Look for someone tall wearing a red hat and blue shoes" rather than giving their exact name. With regular expressions (regex), you can tell the computer to find all text that matches a pattern like "any email address" or "phone numbers in this format." This pattern-matching superpower makes regex an essential tool for searching, validating, and manipulating text in Linux.

Quick Reference

Command What It Does Common Use
grep 'pattern' file Searches for text matching a pattern Finding specific lines in log files or code
grep -E 'pattern' file Uses extended regular expressions More complex pattern matching with fewer escape characters
grep -i 'pattern' file Case-insensitive search Finding text regardless of capitalization
find | grep -E 'pattern' Filters find results using regex Finding files that match specific naming patterns

When to Use Regular Expressions

Understanding grep

The grep command (short for "global regular expression print") is like your pattern-matching detective. It searches through text looking for lines that match your specified pattern and shows you the results. It's one of the most commonly used tools for applying regular expressions in Linux.

Option What It Does When to Use
-i Makes the search case-insensitive When you don't care about exact capitalization
-v Inverts the match (shows non-matching lines) When you want to exclude certain patterns
-c Shows only the count of matching lines When you just need to know how many matches exist
-n Shows line numbers with matches When you need to know where matches occur
-E Uses extended regular expressions When you need more powerful pattern matching
-o Shows only the matching part of the line When you only want to see the pattern that matched
-r Searches recursively through directories When searching through multiple files and folders
-h Suppresses file names in output When you only want to see matching lines without file names

Basic grep Usage

# Find all lines containing "error" in log file
grep 'error' application.log
# Shows every line that contains the word "error"

# Simple search without showing filename
grep -h 'error' application.log
# Shows matching lines without the filename prefix

# Case-insensitive search for warnings
grep -i 'warning' application.log
# Finds "Warning", "WARNING", "warning", etc.

# Count how many errors occurred
grep -c 'error' application.log
# Displays just the number of matching lines

# Find lines that don't contain "success"
grep -v 'success' application.log
# Shows all lines except those containing "success"

Basic Regular Expressions (BRE)

Think of Basic Regular Expressions as the foundation vocabulary of the pattern-matching language. These are the simpler patterns that most tools support by default. In BRE, some special characters need to be escaped with a backslash (\) to use their special meaning.

Pattern What It Matches When to Use Example
^ Beginning of a line When you need to find patterns at the start of lines ^ERROR matches lines starting with "ERROR"
$ End of a line When you need to find patterns at the end of lines failed$ matches lines ending with "failed"
. Any single character When you need to match any character in a specific position b.t matches "bat", "bit", "bot", etc.
* Zero or more of previous character When something might appear multiple times or not at all lo*l matches "ll", "lol", "lool", etc.
[...] Any character in the brackets When you need to match one character from a specific set [aeiou] matches any vowel
[^...] Any character NOT in the brackets When you need to exclude specific characters [^0-9] matches any non-digit
\{n\} Exactly n occurrences When you need an exact number of repetitions a\{3\} matches exactly "aaa"
\{n,m\} Between n and m occurrences When you need a range of repetitions a\{2,4\} matches "aa", "aaa", or "aaaa"
\+ One or more of previous character When you need at least one occurrence a\+ matches "a", "aa", "aaa", etc., but not ""
\? Zero or one of previous character When something is optional colou\?r matches "color" or "colour"

BRE Examples

# Find lines starting with "From:"
grep '^From:' email.txt
# Only matches lines that begin with "From:"

# Find lines ending with a period
grep '\.$' document.txt
# Only matches lines that end with a period

# Find all 3-letter words
grep '\<[a-zA-Z]\{3\}\>' document.txt
# Matches "cat", "dog", "The", etc.

# Find phone numbers in format 555-123-4567
grep '[0-9]\{3\}-[0-9]\{3\}-[0-9]\{4\}' contacts.txt
# Matches phone numbers with that specific pattern

# Find words starting with 'a' and ending with 'e'
grep '\' document.txt
# Matches "apple", "awesome", "altitude", etc.

Extended Regular Expressions (ERE)

Think of Extended Regular Expressions as the advanced vocabulary that gives you more expressive power with less typing. ERE is like BRE's more modern cousin that doesn't require you to escape certain special characters. You access ERE using grep -E (or the older egrep command).

Pattern What It Matches When to Use Example
+ One or more of previous character When you need at least one occurrence a+ matches "a", "aa", "aaa", etc.
? Zero or one of previous character When something is optional colou?r matches "color" or "colour"
{n} Exactly n occurrences When you need an exact number of repetitions a{3} matches exactly "aaa"
{n,m} Between n and m occurrences When you need a range of repetitions a{2,4} matches "aa", "aaa", or "aaaa"
| Alternation (OR) When matching any of several patterns cat|dog matches "cat" or "dog"
(...) Groups patterns together When applying operators to multiple characters (ab)+ matches "ab", "abab", "ababab", etc.
(?:...) Non-capturing group When you need grouping without capturing (?:ab)+c matches "abc", "ababc", etc.

ERE Examples

# Find either "error" or "warning"
grep -E 'error|warning' application.log
# Matches lines containing either word

# Find words that start with 'p' and end with 'ing'
grep -E '\bp\w+ing\b' document.txt
# Matches "playing", "programming", "presenting", etc.

# Find valid IP addresses
grep -E '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' network.log
# Matches patterns like 192.168.1.1

# Find HTML tags
grep -E '<[^>]+>' webpage.html
# Matches 
,

, etc. # Find email addresses grep -E '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b' contacts.txt # Matches most standard email formats

Character Classes

Character classes are like shortcuts for common groups of characters. They make your patterns more readable and save you from typing long lists of characters. In Linux, character classes are written inside brackets with a special syntax.

Character Class What It Matches When to Use Equivalent To
[[:alpha:]] Any letter When matching alphabetic characters [A-Za-z]
[[:digit:]] Any digit When matching numbers [0-9]
[[:alnum:]] Any letter or digit When matching alphanumeric characters [A-Za-z0-9]
[[:space:]] Any whitespace When matching spaces, tabs, newlines [ \t\r\n\v\f]
[[:blank:]] Spaces and tabs only When matching horizontal whitespace [ \t]
[[:upper:]] Uppercase letters When matching capital letters [A-Z]
[[:lower:]] Lowercase letters When matching small letters [a-z]
[[:punct:]] Punctuation characters When matching symbols and punctuation [!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~]
[[:print:]] Printable characters When matching visible characters Letters, digits, spaces, punctuation
[[:cntrl:]] Control characters When matching non-printable control characters ASCII 0-31 and 127

Character Class Examples

# Find lines that start with a digit
grep '^[[:digit:]]' data.txt
# Matches lines starting with 0-9

# Find words that contain only letters
grep -E '\b[[:alpha:]]+\b' document.txt
# Matches words with no digits or symbols

# Find lines with punctuation
grep '[[:punct:]]' document.txt
# Matches lines containing any punctuation mark

# Find words starting with uppercase
grep -E '\b[[:upper:]][[:alpha:]]*\b' document.txt
# Matches words starting with capital letters

# Find lines with whitespace at the end
grep '[[:space:]]$' code.txt
# Helps find trailing whitespace in code

Using Regular Expressions with find

The find command can use regular expressions to search for files with names matching specific patterns. This is especially useful when looking for files with complex naming conventions.

Using find with BRE

# Find all .txt files
find /path/to/search -regex '.*\.txt$'
# Matches file.txt, notes.txt, etc.

# Find files with names containing numbers
find /path/to/search -regex '.*[0-9].*'
# Matches file1.txt, report2.pdf, etc.

# Find files with exactly 3-character extensions
find /path/to/search -regex '.*\.[a-zA-Z]\{3\}$'
# Matches file.txt, image.jpg, script.php, etc.

Combining find with grep

# Find .txt or .log files using ERE
find /path/to/search -type f | grep -E '\.(txt|log)$'
# Lists files ending in .txt or .log

# Find files containing "backup" followed by a date (YYYYMMDD)
find /path/to/search -type f | grep -E 'backup_[0-9]{8}'
# Matches backup_20220315, backup_20231127, etc.

# Find files not in common image formats
find /path/to/search -type f | grep -vE '\.(jpg|png|gif|bmp)$'
# Lists files that don't end with common image extensions

Real-World Use Cases

Log Analysis

# Find all error messages with timestamps
grep -E '^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}).*ERROR' application.log
# Matches log lines with timestamps followed by ERROR

# Count errors by type
grep 'ERROR' application.log | grep -Eo 'ERROR: [A-Za-z]+' | sort | uniq -c
# Groups and counts different types of errors

# Extract all IP addresses from a log file
grep -Eo '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' access.log | sort | uniq
# Finds all unique IP addresses

Code Search

# Find all function definitions in Python files
grep -r -E '^def [a-zA-Z_][a-zA-Z0-9_]*\(' --include="*.py" ./src
# Locates all Python function definitions

# Find TODO comments in code
grep -r -E '//\s*TODO:' --include="*.js" ./src
# Finds JavaScript TODO comments

Tips for Success

Common Mistakes to Avoid

Best Practices