Regular Expressions
Think of regular expressions like a powerful search language that lets you describe patterns instead of exact matches. It's similar to how you might describe a person to someone: "Look for someone tall wearing a red hat and blue shoes" rather than giving their exact name. With regular expressions (regex), you can tell the computer to find all text that matches a pattern like "any email address" or "phone numbers in this format." This pattern-matching superpower makes regex an essential tool for searching, validating, and manipulating text in Linux.
Quick Reference
| Command | What It Does | Common Use |
|---|---|---|
grep 'pattern' file |
Searches for text matching a pattern | Finding specific lines in log files or code |
grep -E 'pattern' file |
Uses extended regular expressions | More complex pattern matching with fewer escape characters |
grep -i 'pattern' file |
Case-insensitive search | Finding text regardless of capitalization |
find | grep -E 'pattern' |
Filters find results using regex | Finding files that match specific naming patterns |
When to Use Regular Expressions
- When you need to search for patterns rather than exact text
- When validating input formats (like emails, phone numbers, dates)
- When extracting specific information from large text files
- When filtering command output for specific patterns
- When searching for files with complex naming patterns
- When you need to perform search and replace operations with patterns
Understanding grep
The grep command (short for "global regular expression print") is like your pattern-matching detective. It searches through text looking for lines that match your specified pattern and shows you the results. It's one of the most commonly used tools for applying regular expressions in Linux.
| Option | What It Does | When to Use |
|---|---|---|
-i |
Makes the search case-insensitive | When you don't care about exact capitalization |
-v |
Inverts the match (shows non-matching lines) | When you want to exclude certain patterns |
-c |
Shows only the count of matching lines | When you just need to know how many matches exist |
-n |
Shows line numbers with matches | When you need to know where matches occur |
-E |
Uses extended regular expressions | When you need more powerful pattern matching |
-o |
Shows only the matching part of the line | When you only want to see the pattern that matched |
-r |
Searches recursively through directories | When searching through multiple files and folders |
-h |
Suppresses file names in output | When you only want to see matching lines without file names |
Basic grep Usage
The following examples assume a log file named `application.log` with the following content:INFO: User login successful.
WARN: Disk space running low.
ERROR: Database connection error.
INFO: Data processed.
DEBUG: Cache cleared.
ERROR: API rate limit exceeded.
INFO: Report generated.
WARN: Unused variable detected.
ERROR: File not found.
SUCCESS: Operation completed.
# Find all lines containing "error" in log file
grep 'ERROR' application.log
# Shows every line that contains the word "error"
# Case-insensitive search for warnings
grep -i 'warning' application.log
# Finds "Warning", "WARNING", "warning", etc.
# Count how many errors occurred
grep -c 'ERROR' application.log
# Displays just the number of matching lines
# Find lines that don't contain "success"
grep -v 'SUCCESS' application.log
# Shows all lines except those containing "success"
Basic Regular Expressions (BRE)
Think of Basic Regular Expressions as the foundation vocabulary of the pattern-matching language. These are the simpler patterns that most tools support by default. In BRE, some special characters need to be escaped with a backslash (\) to use their special meaning.
| Pattern | What It Matches | When to Use | Example |
|---|---|---|---|
^ |
Beginning of a line | When you need to find patterns at the start of lines | ^ERROR matches lines starting with "ERROR" |
$ |
End of a line | When you need to find patterns at the end of lines | failed$ matches lines ending with "failed" |
. |
Any single character | When you need to match any character in a specific position | b.t matches "bat", "bit", "bot", etc. |
* |
Zero or more of previous character | When something might appear multiple times or not at all | lo*l matches "ll", "lol", "lool", etc. |
[...] |
Any character in the brackets | When you need to match one character from a specific set | [aeiou] matches any vowel |
[^...] |
Any character NOT in the brackets | When you need to exclude specific characters | [^0-9] matches any non-digit |
\{n\} |
Exactly n occurrences | When you need an exact number of repetitions | a\{3\} matches exactly "aaa" |
\{n,m\} |
Between n and m occurrences | When you need a range of repetitions | a\{2,4\} matches "aa", "aaa", or "aaaa" |
\+ |
One or more of previous character | When you need at least one occurrence | a\+ matches "a", "aa", "aaa", etc., but not "" |
\? |
Zero or one of previous character | When something is optional | colou\?r matches "color" or "colour" |
\< |
Beginning of a word | When you need to match a pattern at the start of a word | \ |
\> |
End of a word | When you need to match a pattern at the end of a word | cat\> matches "cat" in "The cat sat." but not in "duplicate" |
BRE Examples
The following examples assume a file named `document1.txt` with the following content:From: John Doe
This is a test document.
Cat, dog, bat, run, sun.
My phone number is 555-123-4567.
Apple and awesome altitude.
Another line here.
End of the document.
# Find lines starting with "From:"
grep '^From:' document1.txt
# Only matches lines that begin with "From:"
# Find lines ending with a period
grep '\.$' document1.txt
# Only matches lines that end with a period
# Find all 3-letter words
grep '\<[a-zA-Z]\{3\}\>' document1.txt
# Matches "cat", "dog", "The", etc.
# Find phone numbers in format 555-123-4567
grep '[0-9]\{3\}-[0-9]\{3\}-[0-9]\{4\}' document1.txt
# Matches phone numbers with that specific pattern
# Find words starting with 'a' and ending with 'e'
grep '\<a[a-zA-Z]*e\>' document1.txt
# Matches "apple", "awesome", "altitude", etc.
Extended Regular Expressions (ERE)
Think of Extended Regular Expressions as the advanced vocabulary that gives you more expressive power with less typing. ERE is like BRE's more modern cousin that doesn't require you to escape certain special characters. You access ERE using grep -E (or the older egrep command).
| Pattern | What It Matches | When to Use | Example |
|---|---|---|---|
+ |
One or more of previous character | When you need at least one occurrence | a+ matches "a", "aa", "aaa", etc. |
? |
Zero or one of previous character | When something is optional | colou?r matches "color" or "colour" |
{n} |
Exactly n occurrences | When you need an exact number of repetitions | a{3} matches exactly "aaa" |
{n,m} |
Between n and m occurrences | When you need a range of repetitions | a{2,4} matches "aa", "aaa", or "aaaa" |
| |
Alternation (OR) | When matching any of several patterns | cat|dog matches "cat" or "dog" |
(...) |
Groups patterns together | When applying operators to multiple characters | (ab)+ matches "ab", "abab", "ababab", etc. |
(?:...) |
Non-capturing group | When you need grouping without capturing | (?:ab)+c matches "abc", "ababc", etc. |
ERE Examples
The following examples assume a file named `document2.txt` with the following content:This line has an error and a warning.
Playing and programming are fun activities.
The IP address is 192.168.1.1 and another is 10.0.0.255.
<div>This is a div</div> <p class="text">And a paragraph.</p>
Contact us at test@example.com or support@sub.domain.org.
# Find either "error" or "warning"
grep -E 'error|warning' document2.txt
# Matches lines containing either word
# Find words that start with 'p' and end with 'ing'
grep -E '\bp\w+ing\b' document2.txt
# Matches "playing", "programming", "presenting", etc.
# Find valid IP addresses
grep -E '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' document2.txt
# Matches patterns like 192.168.1.1
# Find HTML tags
grep -E '<[^>]+>' document2.txt
# Matches <div>, <p class="text">, etc.
# Find email addresses
grep -E '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b' document2.txt
# Matches most standard email formats
Character Classes
Character classes are like shortcuts for common groups of characters. They make your patterns more readable and save you from typing long lists of characters. In Linux, character classes are written inside brackets with a special syntax.
| Character Class | What It Matches | When to Use | Equivalent To |
|---|---|---|---|
[[:alpha:]] |
Any letter | When matching alphabetic characters | [A-Za-z] |
[[:digit:]] |
Any digit | When matching numbers | [0-9] |
[[:alnum:]] |
Any letter or digit | When matching alphanumeric characters | [A-Za-z0-9] |
[[:space:]] |
Any whitespace | When matching spaces, tabs, newlines | [ \t\r\n\v\f] |
[[:blank:]] |
Spaces and tabs only | When matching horizontal whitespace | [ \t] |
[[:upper:]] |
Uppercase letters | When matching capital letters | [A-Z] |
[[:lower:]] |
Lowercase letters | When matching small letters | [a-z] |
[[:punct:]] |
Punctuation characters | When matching symbols and punctuation | [!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~] |
[[:print:]] |
Printable characters | When matching visible characters | Letters, digits, spaces, punctuation |
[[:cntrl:]] |
Control characters | When matching non-printable control characters | ASCII 0-31 and 127 |
Character Class Examples
The following examples assume a file named `document3.txt` with the following content:123 This line starts with a digit.
Hello World! This contains only letters.
This line has some punctuation, like commas, periods, and question marks?
A Word Starting with an Uppercase Letter.
Trailing whitespace here.
# Find lines that start with a digit
grep '^[[:digit:]]' document3.txt
# Matches lines starting with 0-9
# Find words that contain only letters
grep -E '\b[[:alpha:]]+\b' document3.txt
# Matches words with no digits or symbols
# Find lines with punctuation
grep '[[:punct:]]' document3.txt
# Matches lines containing any punctuation mark
# Find words starting with uppercase
grep -E '\b[[:upper:]][[:alpha:]]*\b' document3.txt
# Matches words starting with capital letters
# Find lines with whitespace at the end
grep '[[:space:]]$' document3.txt
# Helps find trailing whitespace in code
Using Regular Expressions with find
The find command can use regular expressions to search for files with names matching specific patterns. This is especially useful when looking for files with complex naming conventions.
Using find with BRE
# Find all .txt files
find /path/to/search -regex '.*\.txt$'
# Matches file.txt, notes.txt, etc.
# Find files with names containing numbers
find /path/to/search -regex '.*[0-9].*'
# Matches file1.txt, report2.pdf, etc.
# Find files with exactly 3-character extensions
find /path/to/search -regex '.*\.[a-zA-Z]\{3\}$'
# Matches file.txt, image.jpg, script.php, etc.
Combining find with grep
# Find .txt or .log files using ERE
find /path/to/search -type f | grep -E '\.(txt|log)$'
# Lists files ending in .txt or .log
# Find files containing "backup" followed by a date (YYYYMMDD)
find /path/to/search -type f | grep -E 'backup_[0-9]{8}'
# Matches backup_20220315, backup_20231127, etc.
# Find files not in common image formats
find /path/to/search -type f | grep -vE '\.(jpg|png|gif|bmp)$'
# Lists files that don't end with common image extensions
Real-World Use Cases
Log Analysis
# Find all error messages with timestamps
grep -E '^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}).*ERROR' application.log
# Matches log lines with timestamps followed by ERROR
# Count errors by type
grep 'ERROR' application.log | grep -Eo 'ERROR: [A-Za-z]+' | sort | uniq -c
# Groups and counts different types of errors
# Extract all IP addresses from a log file
grep -Eo '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' access.log | sort | uniq
# Finds all unique IP addresses
Code Search
# Find all function definitions in Python files
grep -r -E '^def [a-zA-Z_][a-zA-Z0-9_]*\(' --include="*.py" ./src
# Locates all Python function definitions
# Find TODO comments in code
grep -r -E '//\s*TODO:' --include="*.js" ./src
# Finds JavaScript TODO comments
Tips for Success
- Start simple and build up complex patterns incrementally
- Test your patterns on a small sample of text before using them on large files
- Use
grep -Ewhen possible to avoid having to escape special characters - Remember that
*matches zero or more, while+matches one or more - Use character classes like
[[:digit:]]for better readability - Anchor patterns with
^and$when you want to match entire lines - Use
\bto match word boundaries in extended regex - Use tools like
grep -oto see just the matching text, not the whole line - Combine regex with other tools like
sort,uniq, andawkfor powerful text processing
Common Mistakes to Avoid
- Forgetting that
.matches any character (use\.to match a literal period) - Using
*alone, which matches nothing (it means "zero or more of the previous character") - Not escaping special characters in basic regex (
+,?,{},()) - Forgetting to use
-Ewith grep when using extended regex features - Creating overly complex patterns that are hard to debug
- Not accounting for possible variations in input (spaces, capitalization, etc.)
- Using regex when a simpler tool would work (like plain string matching)
- Not considering the context around matches (like word boundaries)
Best Practices
- Comment complex regex patterns to explain what they do
- Break complex patterns into smaller, more manageable pieces
- Use character classes and quantifiers to make patterns more readable
- Test regex patterns on both matching and non-matching input
- Consider case sensitivity requirements (
-ivs. explicit character ranges) - Use non-capturing groups
(?:...)when you don't need to reference the match - Save useful regex patterns in your notes or as shell aliases
- When using regex in scripts, validate input to avoid regex injection attacks