
tr, sed, awk and aspell Commands
Imagine you're editing a document and need to make the same change hundreds of times. Maybe you need to replace every instance of your name with "The Author," convert all text to uppercase, or find and fix spelling mistakes. Just as a find-and-replace function transforms your document in a word processor, Linux provides powerful commands that act like text transformation wizards. In this chapter, we'll explore four essential tools that help you modify and improve text without tedious manual editing.
Quick Reference
Command | Description | Common Use |
---|---|---|
tr |
Translate or delete characters | Character-by-character substitution, case conversion, removing specific characters |
sed |
Stream editor for filtering and transforming text | Find and replace text, deleting lines, more complex text transformations |
awk |
Pattern scanning and processing language | Field-based text processing, data extraction, report generation, programming text operations |
aspell |
Interactive spell checker | Finding and correcting spelling errors in documents |
The tr Command
The tr (translate) command is a powerful text processing utility that works on a character-by-character basis. It reads from standard input, performs substitution or deletion of specified characters, and writes to standard output. Unlike more complex text processors, tr operates on individual characters rather than patterns or words, making it ideal for quick character transformations like case conversion, whitespace cleanup, or basic character removal. Think of tr as a character-level search and replace tool for your text streams.
When to Use
- When you need to convert text between uppercase and lowercase
- When you want to replace or remove specific characters
- When cleaning data by removing unwanted characters (like extra spaces)
- When converting between different types of line endings (DOS to Unix)
- When creating quick character substitution ciphers
Common Options
Option | What It Does | When to Use It |
---|---|---|
-d |
Deletes characters instead of replacing them | When you need to remove specific characters (like removing all vowels) |
-s |
Squeezes repeated characters into a single character | When cleaning text with multiple spaces or other repeated characters |
-c |
Complements the set of characters to work on | When you want to operate on characters NOT in the specified set |
Practical Examples
Let's say you have a file called message.txt
with the following content:
hello world!
this is SOME text with MiXeD case.
too many spaces here.
To convert the text to all uppercase:
# Make everything UPPERCASE
cat message.txt | tr 'a-z' 'A-Z'
Output:
HELLO WORLD!
THIS IS SOME TEXT WITH MIXED CASE.
TOO MANY SPACES HERE.
To remove all vowels from the text:
# Remove all vowels
cat message.txt | tr -d 'aeiouAEIOU'
Output:
hll wrld!
ths s SM txt wth MXD cs.
t mny spcs hr.
To compress multiple spaces into single spaces:
# Clean up extra spaces
cat message.txt | tr -s ' '
Output:
hello world!
this is SOME text with MiXeD case.
too many spaces here.
To create a simple substitution cipher (replacing each letter with the next one in the alphabet):
# Create a simple cipher
echo "secret message" | tr 'a-zA-Z' 'b-zA-Za'
Output:
tfdsfu nfttbhf
The sed Command
The sed (stream editor) command is a sophisticated text transformation tool that processes text line by line. It allows you to perform search-and-replace operations, delete specific lines, and apply complex text transformations using regular expressions. While tr works only with individual characters, sed can work with patterns, words, and even multi-line content. It's particularly useful for batch editing files, extracting specific content from texts, and automating repetitive text editing tasks. Think of sed as having a text editor's power but with the ability to script your edits.
When to Use
- When you need to find and replace text patterns
- When you want to extract specific lines from a file
- When performing multiple text transformations at once
- When you need to modify text files without opening them in an editor
- When processing text as part of a script or pipeline
Common Options
Option | What It Does | When to Use It |
---|---|---|
-e |
Adds multiple editing commands | When you need to apply several transformations in one command |
-f |
Takes commands from a script file | For complex or reusable transformations stored in a separate file |
-i |
Edits files in-place (modifies the original file) | When you want to change the file directly, not just see the output |
-n |
Suppresses automatic printing of patterns | When you want to control exactly what output is shown |
/g |
Global flag - replaces all occurrences on each line | When you want to replace every instance, not just the first one |
/d |
Delete command - removes lines matching the pattern | When you want to remove entire lines that contain specific text |
/p |
Print command - prints lines matching the pattern | When you want to show only lines that contain specific text |
Practical Examples
Let's use a file called email.txt
with the following content:
Dear Customer,
Your order #12345 has been shipped.
You should receive your package by Monday.
If you have any questions about your order #12345,
please contact customer support at support@example.com.
Thank you for shopping with us!
To replace the first occurrence of "order" with "purchase" on each line:
# Replace first 'order' with 'purchase' on each line
sed 's/order/purchase/' email.txt
Output:
Dear Customer,
Your purchase #12345 has been shipped.
You should receive your package by Monday.
If you have any questions about your purchase #12345,
please contact customer support at support@example.com.
Thank you for shopping with us!
To replace ALL occurrences of "order" with "purchase":
# Replace ALL occurrences of 'order' with 'purchase'
sed 's/order/purchase/g' email.txt
To delete any line containing an email address:
# Remove lines containing email addresses
sed '/[@]/d' email.txt
Output:
Dear Customer,
Your order #12345 has been shipped.
You should receive your package by Monday.
If you have any questions about your order #12345,
Thank you for shopping with us!
To replace the order number with a different one and save changes to the file:
# Change order number in the file directly
sed -i 's/12345/67890/g' email.txt
To display only lines containing the word "Customer":
# Show only lines with "Customer"
sed -n '/Customer/p' email.txt
Output:
Dear Customer,
The awk Command
The awk command is a powerful text processing language that combines the pattern matching capabilities of sed with the field processing abilities of cut, plus adds full programming features. It reads input line by line, automatically splits each line into fields, and allows you to write programs that can perform complex text transformations, data analysis, and report generation. Unlike tr and sed which work on characters and patterns, awk operates on fields and records, making it ideal for processing structured data like CSV files, log files, and tabular data. Think of awk as a complete programming language specifically designed for text processing.
When to Use
- When you need to process data field by field (like CSV files)
- When you want to perform calculations on numeric data in text files
- When generating reports with formatted output
- When you need conditional processing based on field values
- When combining multiple text processing operations in one command
- When you need to aggregate or summarize data from text files
Common Options
Option | What It Does | When to Use It |
---|---|---|
-F SEP |
Sets the field separator (default is whitespace) | When working with CSV files or other delimited data |
-v VAR=VAL |
Sets a variable before processing begins | When you need to pass parameters to your awk script |
-f FILE |
Reads awk commands from a file | For complex or reusable awk programs |
Common Built-in Variables
Variable | What It Contains | When to Use It |
---|---|---|
NR |
Current record (line) number | When you need to skip header rows or track line numbers |
NF |
Number of fields in current record | When you need to check if a line has enough fields |
$0 |
Entire current record (line) | When you want to work with the whole line |
$1, $2, $3... |
Individual fields (columns) from current record | When you need to access specific columns of data |
FS |
Field separator (default is whitespace) | When you need to change the separator programmatically |
OFS |
Output field separator (default is space) | When you want to control how fields are separated in output |
Practical Examples
Let's say you have a file called sales.csv
with the following content:
Product,Price,Quantity
Laptop,999.99,5
Mouse,25.50,20
Keyboard,75.00,10
Monitor,299.99,3
To print only the product names (first field):
# Print just the product names
awk -F ',' '{print $1}' sales.csv
Output:
Product
Laptop
Mouse
Keyboard
Monitor
To calculate the total value for each product (price × quantity):
# Calculate total value for each product
awk -F ',' 'NR>1 {print $1 ": $" $2 * $3}' sales.csv
Output:
Laptop: $4999.95
Mouse: $510
Keyboard: $750
Monitor: $899.97
To find products with a price greater than $50:
# Show expensive products
awk -F ',' '$2 > 50 {print $1 " costs $" $2}' sales.csv
Output:
Laptop costs $999.99
Keyboard costs $75.00
Monitor costs $299.99
To calculate the total revenue from all sales:
# Calculate total revenue
awk -F ',' 'NR>1 {total += $2 * $3} END {print "Total revenue: $" total}' sales.csv
Output:
Total revenue: $7159.92
To process a log file and extract specific information:
access.log
:
192.168.1.1 - - [10/Oct/2023:13:55:36] "GET /page1 HTTP/1.1" 200 1234
192.168.1.2 - - [10/Oct/2023:13:55:37] "GET /page2 HTTP/1.1" 404 567
192.168.1.1 - - [10/Oct/2023:13:55:38] "POST /login HTTP/1.1" 200 890
To count requests by IP address:
# Count requests per IP address
awk '{count[$1]++} END {for (ip in count) print ip ": " count[ip] " requests"}' access.log
Output:
192.168.1.1: 2 requests
192.168.1.2: 1 requests
The aspell Command
The aspell command is an interactive spell checking utility designed to find and correct spelling errors in text documents. It offers more accurate results than older spell checkers and supports multiple languages through installable dictionaries. When aspell identifies a potentially misspelled word, it provides a list of suggested replacements ranked by likelihood. This makes it invaluable for proofreading documents, checking email drafts, or ensuring documentation is free from spelling errors before publication. Unlike tr and sed, which transform text, aspell focuses specifically on identifying and correcting spelling mistakes.
When to Use
- When you need to check document spelling
- When preparing documents for publication or submission
- When proofreading text files without a word processor
- When creating scripts that need to verify spelling
- When working with documents in multiple languages
Common Options
Option | What It Does | When to Use It |
---|---|---|
-c |
Checks spelling in a file | For interactive spell checking with suggestions |
-a |
Runs in 'pipe mode' for programmatic use | When using aspell in scripts or with other programs |
-l |
Lists misspelled words | When you just want to identify errors without correcting them |
-d [dict] |
Uses a specific dictionary | When checking documents in different languages |
Practical Examples
Let's say you have a file called report.txt
with the following content:
This is a sampel report.
It containes some mispeled words.
The speling here is not corect.
To check the spelling of the file interactively:
# Interactive spell check
aspell check report.txt
This opens an interactive session where aspell offers suggestions for each misspelled word:
1) sample 6) sampled
2) samples 7) sampans
3) sampler 8) simpler
4) samplers 9) sampler's
5) Sampel 0) sampels
i) Ignore I) Ignore all
r) Replace R) Replace all
a) Add x) Exit
?
To just list all misspelled words without correcting them:
# Just list misspelled words
aspell list < report.txt
Output:
sampel
containes
mispeled
speling
corect
To check spelling using a different language dictionary:
# Check spelling with French dictionary
aspell -d fr check french_report.txt
To see which dictionaries are available on your system:
# List available dictionaries
aspell dicts
Tips for Success
- The
tr
command only works with single characters, not patterns or words - Use
sed
when you need to work with patterns rather than individual characters - Use
awk
when you need to process data field by field or perform calculations - Make a backup before using
sed -i
to edit files in-place - Character classes like
[[:upper:]]
intr
make it easier to work with character groups - For complex text transformations, you can chain these commands together with pipes
- Start with simple
awk
commands and gradually build complexity - Use
awk
's built-in variables likeNR
(record number) andNF
(number of fields)
Common Mistakes to Avoid
- Forgetting that
tr
can only translate one character to one character - Not escaping special characters in
sed
patterns (like.
,*
,$
) - Using
sed -i
without a backup on important files - Assuming
aspell
will catch all errors (it might miss contextual mistakes) - Using the wrong dictionary with
aspell
for your document's language - Forgetting to set the field separator with
-F
inawk
when working with CSV files - Not handling the header row properly in
awk
when processing CSV files - Using
awk
for simple character operations thattr
could handle more efficiently
Best Practices
- Test your
tr
andsed
commands on a sample of the data first - Use
sed -i.bak
to create automatic backups before editing - Create a personal dictionary for
aspell
if you use specialized terminology - Document complex transformations for future reference
- For advanced text processing, learn regular expressions to unlock the full power of
sed
- Use
awk
for data analysis and report generation from structured text files - Combine
awk
with other commands in pipelines for powerful data processing workflows - Learn
awk
's pattern-action model: patterns select lines, actions process them