CIS120 Book

CIS120 Linux Fundamentals by Scott Shaper

tr, sed, awk and aspell Commands

Imagine you're editing a document and need to make the same change hundreds of times. Maybe you need to replace every instance of your name with "The Author," convert all text to uppercase, or find and fix spelling mistakes. Just as a find-and-replace function transforms your document in a word processor, Linux provides powerful commands that act like text transformation wizards. In this chapter, we'll explore four essential tools that help you modify and improve text without tedious manual editing.

Quick Reference

Command	Description	Common Use
`tr`	Translate or delete characters	Character-by-character substitution, case conversion, removing specific characters
`sed`	Stream editor for filtering and transforming text	Find and replace text, deleting lines, more complex text transformations
`awk`	Pattern scanning and processing language	Field-based text processing, data extraction, report generation, programming text operations
`aspell`	Interactive spell checker	Finding and correcting spelling errors in documents

The tr Command

The tr (translate) command is a powerful text processing utility that works on a character-by-character basis. It reads from standard input, performs substitution or deletion of specified characters, and writes to standard output. Unlike more complex text processors, tr operates on individual characters rather than patterns or words, making it ideal for quick character transformations like case conversion, whitespace cleanup, or basic character removal. Think of tr as a character-level search and replace tool for your text streams.

When to Use

When you need to convert text between uppercase and lowercase
When you want to replace or remove specific characters
When cleaning data by removing unwanted characters (like extra spaces)
When converting between different types of line endings (DOS to Unix)
When creating quick character substitution ciphers

Common Options

Option	What It Does	When to Use It
`-d`	Deletes characters instead of replacing them	When you need to remove specific characters (like removing all vowels)
`-s`	Squeezes repeated characters into a single character	When cleaning text with multiple spaces or other repeated characters
`-c`	Complements the set of characters to work on	When you want to operate on characters NOT in the specified set

Practical Examples

Let's say you have a file called message.txt with the following content:

hello   world!
this is SOME text with MiXeD case.
too    many     spaces   here.

To convert the text to all uppercase:

# Make everything UPPERCASE
cat message.txt | tr 'a-z' 'A-Z'

Output:

HELLO   WORLD!
THIS IS SOME TEXT WITH MIXED CASE.
TOO    MANY     SPACES   HERE.

To remove all vowels from the text:

# Remove all vowels
cat message.txt | tr -d 'aeiouAEIOU'

Output:

hll   wrld!
ths s SM txt wth MXD cs.
t    mny     spcs   hr.

To compress multiple spaces into single spaces:

# Clean up extra spaces
cat message.txt | tr -s ' '

Output:

hello world!
this is SOME text with MiXeD case.
too many spaces here.

To create a simple substitution cipher (replacing each letter with the next one in the alphabet):

# Create a simple cipher
echo "secret message" | tr 'a-zA-Z' 'b-zA-Za'

Output:

tfdsfu nfttbhf

The sed Command

The sed (stream editor) command is a sophisticated text transformation tool that processes text line by line. It allows you to perform search-and-replace operations, delete specific lines, and apply complex text transformations using regular expressions. While tr works only with individual characters, sed can work with patterns, words, and even multi-line content. It's particularly useful for batch editing files, extracting specific content from texts, and automating repetitive text editing tasks. Think of sed as having a text editor's power but with the ability to script your edits.

When to Use

When you need to find and replace text patterns
When you want to extract specific lines from a file
When performing multiple text transformations at once
When you need to modify text files without opening them in an editor
When processing text as part of a script or pipeline

Common Options

Option	What It Does	When to Use It
`-e`	Adds multiple editing commands	When you need to apply several transformations in one command
`-f`	Takes commands from a script file	For complex or reusable transformations stored in a separate file
`-i`	Edits files in-place (modifies the original file)	When you want to change the file directly, not just see the output
`-n`	Suppresses automatic printing of patterns	When you want to control exactly what output is shown
`/g`	Global flag - replaces all occurrences on each line	When you want to replace every instance, not just the first one
`/d`	Delete command - removes lines matching the pattern	When you want to remove entire lines that contain specific text
`/p`	Print command - prints lines matching the pattern	When you want to show only lines that contain specific text

Practical Examples

Let's use a file called email.txt with the following content:

Dear Customer,
Your order #12345 has been shipped.
You should receive your package by Monday.
If you have any questions about your order #12345,
please contact customer support at support@example.com.
Thank you for shopping with us!

To replace the first occurrence of "order" with "purchase" on each line:

# Replace first 'order' with 'purchase' on each line
sed 's/order/purchase/' email.txt

Output:

Dear Customer,
Your purchase #12345 has been shipped.
You should receive your package by Monday.
If you have any questions about your purchase #12345,
please contact customer support at support@example.com.
Thank you for shopping with us!

To replace ALL occurrences of "order" with "purchase":

# Replace ALL occurrences of 'order' with 'purchase'
sed 's/order/purchase/g' email.txt

To delete any line containing an email address:

# Remove lines containing email addresses
sed '/[@]/d' email.txt

Output:

Dear Customer,
Your order #12345 has been shipped.
You should receive your package by Monday.
If you have any questions about your order #12345,
Thank you for shopping with us!

To replace the order number with a different one and save changes to the file:

# Change order number in the file directly
sed -i 's/12345/67890/g' email.txt

To display only lines containing the word "Customer":

# Show only lines with "Customer"
sed -n '/Customer/p' email.txt

Output:

Dear Customer,

The awk Command

The awk command is a powerful text processing language that combines the pattern matching capabilities of sed with the field processing abilities of cut, plus adds full programming features. It reads input line by line, automatically splits each line into fields, and allows you to write programs that can perform complex text transformations, data analysis, and report generation. Unlike tr and sed which work on characters and patterns, awk operates on fields and records, making it ideal for processing structured data like CSV files, log files, and tabular data. Think of awk as a complete programming language specifically designed for text processing.

When to Use

When you need to process data field by field (like CSV files)
When you want to perform calculations on numeric data in text files
When generating reports with formatted output
When you need conditional processing based on field values
When combining multiple text processing operations in one command
When you need to aggregate or summarize data from text files

Common Options

Option	What It Does	When to Use It
`-F SEP`	Sets the field separator (default is whitespace)	When working with CSV files or other delimited data
`-v VAR=VAL`	Sets a variable before processing begins	When you need to pass parameters to your awk script
`-f FILE`	Reads awk commands from a file	For complex or reusable awk programs

Common Built-in Variables

Variable	What It Contains	When to Use It
`NR`	Current record (line) number	When you need to skip header rows or track line numbers
`NF`	Number of fields in current record	When you need to check if a line has enough fields
`$0`	Entire current record (line)	When you want to work with the whole line
`$1, $2, $3...`	Individual fields (columns) from current record	When you need to access specific columns of data
`FS`	Field separator (default is whitespace)	When you need to change the separator programmatically
`OFS`	Output field separator (default is space)	When you want to control how fields are separated in output

Practical Examples

Let's say you have a file called sales.csv with the following content:

Product,Price,Quantity
Laptop,999.99,5
Mouse,25.50,20
Keyboard,75.00,10
Monitor,299.99,3

To print only the product names (first field):

# Print just the product names
awk -F ',' '{print $1}' sales.csv

Output:

Product
Laptop
Mouse
Keyboard
Monitor

To calculate the total value for each product (price × quantity):

# Calculate total value for each product
awk -F ',' 'NR>1 {print $1 ": $" $2 * $3}' sales.csv

Output:

Laptop: $4999.95
Mouse: $510
Keyboard: $750
Monitor: $899.97

To find products with a price greater than $50:

# Show expensive products
awk -F ',' '$2 > 50 {print $1 " costs $" $2}' sales.csv

Output:

Laptop costs $999.99
Keyboard costs $75.00
Monitor costs $299.99

To calculate the total revenue from all sales:

# Calculate total revenue
awk -F ',' 'NR>1 {total += $2 * $3} END {print "Total revenue: $" total}' sales.csv

Output:

Total revenue: $7159.92

To process a log file and extract specific information:

access.log:

192.168.1.1 - - [10/Oct/2023:13:55:36] "GET /page1 HTTP/1.1" 200 1234
192.168.1.2 - - [10/Oct/2023:13:55:37] "GET /page2 HTTP/1.1" 404 567
192.168.1.1 - - [10/Oct/2023:13:55:38] "POST /login HTTP/1.1" 200 890

To count requests by IP address:

# Count requests per IP address
awk '{count[$1]++} END {for (ip in count) print ip ": " count[ip] " requests"}' access.log

Output:

192.168.1.1: 2 requests
192.168.1.2: 1 requests

To extract month and date from ls -l output:

# Get month and date from ls -l output
ls -l /etc | awk 'NR>1 {print $6 " " $7}'

Output:

Apr 3
Apr 5
Apr 19
Apr 19
Apr 27
Apr 29
Apr 30
Aug 6
Aug 10

To format dates with hyphens instead of spaces:

# Format dates with hyphens
ls -l /etc | awk 'NR>1 {print $6 "-" $7}'

Output:

Apr-3
Apr-5
Apr-19
Apr-19
Apr-27
Apr-29
Apr-30
Aug-6
Aug-10

To get a sorted list of unique dates:

# Get unique dates sorted
ls -l /etc | awk 'NR>1 {print $6 " " $7}' | sort | uniq

Output:

Apr 19
Apr 27
Apr 29
Apr 3
Apr 30
Apr 5
Aug 10
Aug 6

To count how many files were modified on each date:

# Count files per modification date
ls -l /etc | awk 'NR>1 {count[$6 " " $7]++} END {for (date in count) print date ": " count[date] " files"}'

Output:

Apr 19: 2 files
Apr 27: 1 files
Apr 29: 1 files
Apr 3: 1 files
Apr 30: 1 files
Apr 5: 1 files
Aug 10: 1 files
Aug 6: 1 files

The aspell Command

The aspell command is an interactive spell checking utility designed to find and correct spelling errors in text documents. It offers more accurate results than older spell checkers and supports multiple languages through installable dictionaries. When aspell identifies a potentially misspelled word, it provides a list of suggested replacements ranked by likelihood. This makes it invaluable for proofreading documents, checking email drafts, or ensuring documentation is free from spelling errors before publication. Unlike tr and sed, which transform text, aspell focuses specifically on identifying and correcting spelling mistakes.

When to Use

When you need to check document spelling
When preparing documents for publication or submission
When proofreading text files without a word processor
When creating scripts that need to verify spelling
When working with documents in multiple languages

Common Options

Option	What It Does	When to Use It
`-c`	Checks spelling in a file	For interactive spell checking with suggestions
`-a`	Runs in 'pipe mode' for programmatic use	When using aspell in scripts or with other programs
`-l`	Lists misspelled words	When you just want to identify errors without correcting them
`-d [dict]`	Uses a specific dictionary	When checking documents in different languages

Practical Examples

Let's say you have a file called report.txt with the following content:

This is a sampel report.
It containes some mispeled words.
The speling here is not corect.

To check the spelling of the file interactively:

# Interactive spell check
aspell check report.txt

This opens an interactive session where aspell offers suggestions for each misspelled word:

1) sample      6) sampled
2) samples     7) sampans
3) sampler     8) simpler
4) samplers    9) sampler's
5) Sampel      0) sampels
i) Ignore      I) Ignore all
r) Replace     R) Replace all
a) Add         x) Exit
?

To just list all misspelled words without correcting them:

# Just list misspelled words
aspell list < report.txt

Output:

sampel
containes
mispeled
speling
corect

To check spelling using a different language dictionary:

# Check spelling with French dictionary
aspell -d fr check french_report.txt

To see which dictionaries are available on your system:

# List available dictionaries
aspell dicts

Tips for Success

The tr command only works with single characters, not patterns or words
Use sed when you need to work with patterns rather than individual characters
Use awk when you need to process data field by field or perform calculations
Make a backup before using sed -i to edit files in-place
Character classes like [[:upper:]] in tr make it easier to work with character groups
For complex text transformations, you can chain these commands together with pipes
Start with simple awk commands and gradually build complexity
Use awk's built-in variables like NR (record number) and NF (number of fields)

Common Mistakes to Avoid

Forgetting that tr can only translate one character to one character
Not escaping special characters in sed patterns (like ., *, $)
Using sed -i without a backup on important files
Assuming aspell will catch all errors (it might miss contextual mistakes)
Using the wrong dictionary with aspell for your document's language
Forgetting to set the field separator with -F in awk when working with CSV files
Not handling the header row properly in awk when processing CSV files
Using awk for simple character operations that tr could handle more efficiently

Best Practices

Test your tr and sed commands on a sample of the data first
Use sed -i.bak to create automatic backups before editing
Create a personal dictionary for aspell if you use specialized terminology
Document complex transformations for future reference
For advanced text processing, learn regular expressions to unlock the full power of sed
Use awk for data analysis and report generation from structured text files
Combine awk with other commands in pipelines for powerful data processing workflows
Learn awk's pattern-action model: patterns select lines, actions process them