CIS120 Linux Fundamentals by Scott Shaper

tr, sed, awk and aspell Commands

Imagine you're editing a document and need to make the same change hundreds of times. Maybe you need to replace every instance of your name with "The Author," convert all text to uppercase, or find and fix spelling mistakes. Just as a find-and-replace function transforms your document in a word processor, Linux provides powerful commands that act like text transformation wizards. In this chapter, we'll explore four essential tools that help you modify and improve text without tedious manual editing.

Quick Reference

Command Description Common Use
tr Translate or delete characters Character-by-character substitution, case conversion, removing specific characters
sed Stream editor for filtering and transforming text Find and replace text, deleting lines, more complex text transformations
awk Pattern scanning and processing language Field-based text processing, data extraction, report generation, programming text operations
aspell Interactive spell checker Finding and correcting spelling errors in documents

The tr Command

The tr (translate) command is a powerful text processing utility that works on a character-by-character basis. It reads from standard input, performs substitution or deletion of specified characters, and writes to standard output. Unlike more complex text processors, tr operates on individual characters rather than patterns or words, making it ideal for quick character transformations like case conversion, whitespace cleanup, or basic character removal. Think of tr as a character-level search and replace tool for your text streams.

When to Use

  • When you need to convert text between uppercase and lowercase
  • When you want to replace or remove specific characters
  • When cleaning data by removing unwanted characters (like extra spaces)
  • When converting between different types of line endings (DOS to Unix)
  • When creating quick character substitution ciphers

Common Options

Option What It Does When to Use It
-d Deletes characters instead of replacing them When you need to remove specific characters (like removing all vowels)
-s Squeezes repeated characters into a single character When cleaning text with multiple spaces or other repeated characters
-c Complements the set of characters to work on When you want to operate on characters NOT in the specified set

Practical Examples

Let's say you have a file called message.txt with the following content:

hello   world!
this is SOME text with MiXeD case.
too    many     spaces   here.

To convert the text to all uppercase:

# Make everything UPPERCASE
cat message.txt | tr 'a-z' 'A-Z'

Output:

HELLO   WORLD!
THIS IS SOME TEXT WITH MIXED CASE.
TOO    MANY     SPACES   HERE.

To remove all vowels from the text:

# Remove all vowels
cat message.txt | tr -d 'aeiouAEIOU'

Output:

hll   wrld!
ths s SM txt wth MXD cs.
t    mny     spcs   hr.

To compress multiple spaces into single spaces:

# Clean up extra spaces
cat message.txt | tr -s ' '

Output:

hello world!
this is SOME text with MiXeD case.
too many spaces here.

To create a simple substitution cipher (replacing each letter with the next one in the alphabet):

# Create a simple cipher
echo "secret message" | tr 'a-zA-Z' 'b-zA-Za'

Output:

tfdsfu nfttbhf

The sed Command

The sed (stream editor) command is a sophisticated text transformation tool that processes text line by line. It allows you to perform search-and-replace operations, delete specific lines, and apply complex text transformations using regular expressions. While tr works only with individual characters, sed can work with patterns, words, and even multi-line content. It's particularly useful for batch editing files, extracting specific content from texts, and automating repetitive text editing tasks. Think of sed as having a text editor's power but with the ability to script your edits.

When to Use

  • When you need to find and replace text patterns
  • When you want to extract specific lines from a file
  • When performing multiple text transformations at once
  • When you need to modify text files without opening them in an editor
  • When processing text as part of a script or pipeline

Common Options

Option What It Does When to Use It
-e Adds multiple editing commands When you need to apply several transformations in one command
-f Takes commands from a script file For complex or reusable transformations stored in a separate file
-i Edits files in-place (modifies the original file) When you want to change the file directly, not just see the output
-n Suppresses automatic printing of patterns When you want to control exactly what output is shown
/g Global flag - replaces all occurrences on each line When you want to replace every instance, not just the first one
/d Delete command - removes lines matching the pattern When you want to remove entire lines that contain specific text
/p Print command - prints lines matching the pattern When you want to show only lines that contain specific text

Practical Examples

Let's use a file called email.txt with the following content:

Dear Customer,
Your order #12345 has been shipped.
You should receive your package by Monday.
If you have any questions about your order #12345,
please contact customer support at support@example.com.
Thank you for shopping with us!

To replace the first occurrence of "order" with "purchase" on each line:

# Replace first 'order' with 'purchase' on each line
sed 's/order/purchase/' email.txt

Output:

Dear Customer,
Your purchase #12345 has been shipped.
You should receive your package by Monday.
If you have any questions about your purchase #12345,
please contact customer support at support@example.com.
Thank you for shopping with us!

To replace ALL occurrences of "order" with "purchase":

# Replace ALL occurrences of 'order' with 'purchase'
sed 's/order/purchase/g' email.txt

To delete any line containing an email address:

# Remove lines containing email addresses
sed '/[@]/d' email.txt

Output:

Dear Customer,
Your order #12345 has been shipped.
You should receive your package by Monday.
If you have any questions about your order #12345,
Thank you for shopping with us!

To replace the order number with a different one and save changes to the file:

# Change order number in the file directly
sed -i 's/12345/67890/g' email.txt

To display only lines containing the word "Customer":

# Show only lines with "Customer"
sed -n '/Customer/p' email.txt

Output:

Dear Customer,

The awk Command

The awk command is a powerful text processing language that combines the pattern matching capabilities of sed with the field processing abilities of cut, plus adds full programming features. It reads input line by line, automatically splits each line into fields, and allows you to write programs that can perform complex text transformations, data analysis, and report generation. Unlike tr and sed which work on characters and patterns, awk operates on fields and records, making it ideal for processing structured data like CSV files, log files, and tabular data. Think of awk as a complete programming language specifically designed for text processing.

When to Use

  • When you need to process data field by field (like CSV files)
  • When you want to perform calculations on numeric data in text files
  • When generating reports with formatted output
  • When you need conditional processing based on field values
  • When combining multiple text processing operations in one command
  • When you need to aggregate or summarize data from text files

Common Options

Option What It Does When to Use It
-F SEP Sets the field separator (default is whitespace) When working with CSV files or other delimited data
-v VAR=VAL Sets a variable before processing begins When you need to pass parameters to your awk script
-f FILE Reads awk commands from a file For complex or reusable awk programs

Common Built-in Variables

Variable What It Contains When to Use It
NR Current record (line) number When you need to skip header rows or track line numbers
NF Number of fields in current record When you need to check if a line has enough fields
$0 Entire current record (line) When you want to work with the whole line
$1, $2, $3... Individual fields (columns) from current record When you need to access specific columns of data
FS Field separator (default is whitespace) When you need to change the separator programmatically
OFS Output field separator (default is space) When you want to control how fields are separated in output

Practical Examples

Let's say you have a file called sales.csv with the following content:

Product,Price,Quantity
Laptop,999.99,5
Mouse,25.50,20
Keyboard,75.00,10
Monitor,299.99,3

To print only the product names (first field):

# Print just the product names
awk -F ',' '{print $1}' sales.csv

Output:

Product
Laptop
Mouse
Keyboard
Monitor

To calculate the total value for each product (price × quantity):

# Calculate total value for each product
awk -F ',' 'NR>1 {print $1 ": $" $2 * $3}' sales.csv

Output:

Laptop: $4999.95
Mouse: $510
Keyboard: $750
Monitor: $899.97

To find products with a price greater than $50:

# Show expensive products
awk -F ',' '$2 > 50 {print $1 " costs $" $2}' sales.csv

Output:

Laptop costs $999.99
Keyboard costs $75.00
Monitor costs $299.99

To calculate the total revenue from all sales:

# Calculate total revenue
awk -F ',' 'NR>1 {total += $2 * $3} END {print "Total revenue: $" total}' sales.csv

Output:

Total revenue: $7159.92

To process a log file and extract specific information:

access.log:

192.168.1.1 - - [10/Oct/2023:13:55:36] "GET /page1 HTTP/1.1" 200 1234
192.168.1.2 - - [10/Oct/2023:13:55:37] "GET /page2 HTTP/1.1" 404 567
192.168.1.1 - - [10/Oct/2023:13:55:38] "POST /login HTTP/1.1" 200 890

To count requests by IP address:

# Count requests per IP address
awk '{count[$1]++} END {for (ip in count) print ip ": " count[ip] " requests"}' access.log

Output:

192.168.1.1: 2 requests
192.168.1.2: 1 requests

The aspell Command

The aspell command is an interactive spell checking utility designed to find and correct spelling errors in text documents. It offers more accurate results than older spell checkers and supports multiple languages through installable dictionaries. When aspell identifies a potentially misspelled word, it provides a list of suggested replacements ranked by likelihood. This makes it invaluable for proofreading documents, checking email drafts, or ensuring documentation is free from spelling errors before publication. Unlike tr and sed, which transform text, aspell focuses specifically on identifying and correcting spelling mistakes.

When to Use

  • When you need to check document spelling
  • When preparing documents for publication or submission
  • When proofreading text files without a word processor
  • When creating scripts that need to verify spelling
  • When working with documents in multiple languages

Common Options

Option What It Does When to Use It
-c Checks spelling in a file For interactive spell checking with suggestions
-a Runs in 'pipe mode' for programmatic use When using aspell in scripts or with other programs
-l Lists misspelled words When you just want to identify errors without correcting them
-d [dict] Uses a specific dictionary When checking documents in different languages

Practical Examples

Let's say you have a file called report.txt with the following content:

This is a sampel report.
It containes some mispeled words.
The speling here is not corect.

To check the spelling of the file interactively:

# Interactive spell check
aspell check report.txt

This opens an interactive session where aspell offers suggestions for each misspelled word:

1) sample      6) sampled
2) samples     7) sampans
3) sampler     8) simpler
4) samplers    9) sampler's
5) Sampel      0) sampels
i) Ignore      I) Ignore all
r) Replace     R) Replace all
a) Add         x) Exit
?

To just list all misspelled words without correcting them:

# Just list misspelled words
aspell list < report.txt

Output:

sampel
containes
mispeled
speling
corect

To check spelling using a different language dictionary:

# Check spelling with French dictionary
aspell -d fr check french_report.txt

To see which dictionaries are available on your system:

# List available dictionaries
aspell dicts

Tips for Success

  • The tr command only works with single characters, not patterns or words
  • Use sed when you need to work with patterns rather than individual characters
  • Use awk when you need to process data field by field or perform calculations
  • Make a backup before using sed -i to edit files in-place
  • Character classes like [[:upper:]] in tr make it easier to work with character groups
  • For complex text transformations, you can chain these commands together with pipes
  • Start with simple awk commands and gradually build complexity
  • Use awk's built-in variables like NR (record number) and NF (number of fields)

Common Mistakes to Avoid

  • Forgetting that tr can only translate one character to one character
  • Not escaping special characters in sed patterns (like ., *, $)
  • Using sed -i without a backup on important files
  • Assuming aspell will catch all errors (it might miss contextual mistakes)
  • Using the wrong dictionary with aspell for your document's language
  • Forgetting to set the field separator with -F in awk when working with CSV files
  • Not handling the header row properly in awk when processing CSV files
  • Using awk for simple character operations that tr could handle more efficiently

Best Practices

  • Test your tr and sed commands on a sample of the data first
  • Use sed -i.bak to create automatic backups before editing
  • Create a personal dictionary for aspell if you use specialized terminology
  • Document complex transformations for future reference
  • For advanced text processing, learn regular expressions to unlock the full power of sed
  • Use awk for data analysis and report generation from structured text files
  • Combine awk with other commands in pipelines for powerful data processing workflows
  • Learn awk's pattern-action model: patterns select lines, actions process them