The sort and uniq Commands
Think of sort and uniq like organizing your music playlist. sort is like arranging your songs in order (by artist, title, or length), while uniq is like removing duplicate songs. Together, they help you organize and clean up your data, just like organizing your music collection!
Quick Reference
| Command | What It Does | Common Use |
|---|---|---|
sort |
Arranges lines in order | Organizing lists, sorting data |
uniq |
Removes duplicate lines | Finding unique items, counting duplicates |
When to Use These Commands
Use sort and uniq when you want to:
- Organize data alphabetically or numerically
- Remove duplicate entries from a list
- Count how many times items appear
- Find unique or duplicate items
- Clean up messy data
sort Command
The sort command is like a librarian organizing books. It can arrange your data in different ways, just like books can be sorted by title, author, or subject.
Common Options
| Option | What It Does | When to Use It |
|---|---|---|
-b |
Ignore spaces at start | When data has extra spaces |
-n |
Sort numbers correctly | When sorting numerical data |
-r |
Reverse the order | When you want descending order |
-f |
Ignore case | When case doesn't matter |
-k |
Sort by column | When data has multiple columns |
-M |
Sort by month | When sorting dates |
-u |
Remove duplicate lines | When you want unique lines only |
Practical Examples
First let's create some practice files by running these commands
Create a file with names (some duplicates for uniq examples):
cat > names.txt << 'EOF'
John
Jane
Jim
Jill
John
Jane
EOF
cat > numbers.txt << 'EOF'
10
20
30
40
EOF
cat > items.txt << 'EOF'
apple
banana
orange
EOF
Create a file with mixed case for case-insensitive sorting:
cat > mixed_case.txt << 'EOF'
Apple
banana
Cherry
date
EOF
Create a file with duplicate entries:
cat > duplicates.txt << 'EOF'
apple
banana
apple
orange
banana
cherry
EOF
Basic Sorting
# Sort names alphabetically
sort names.txt
# Sort numbers correctly
sort -n numbers.txt
# Sort in reverse order
sort -r items.txt
# Sort ignoring case
sort -f mixed_case.txt
# Sort and remove duplicates
sort -u duplicates.txt
Now let's create files for advanced sorting examples:
Create a file with multiple columns for sorting by column:
cat > data.txt << 'EOF'
Alice 25 Engineer
Bob 30 Manager
Charlie 22 Student
Diana 28 Designer
EOF
Create a file with numbers that have leading spaces:
cat > numbers_spaces.txt << 'EOF'
5
15
2
10
EOF
Create a file with month names for month sorting:
cat > dates.txt << 'EOF'
January
March
December
February
June
EOF
Advanced Sorting
# Sort by second column
sort -k2 data.txt
# Sort numbers and ignore spaces
sort -bn numbers_spaces.txt
# Sort by month names
sort -M dates.txt
uniq Command
The uniq command is like a librarian checking for duplicate books on a shelf. Just like a librarian can only spot duplicate books when they're next to each other on the shelf, uniq can only find duplicates when they're next to each other in the file. That's why we always sort the data first - it's like organizing all the books in order so the librarian can easily spot and remove the duplicates!
Common Options
| Option | What It Does | When to Use It |
|---|---|---|
-c |
Count occurrences | When you want to count duplicates |
-d |
Show only duplicates | When you want to find repeated items |
-u |
Show only unique items | When you want to find unique items |
-i |
Ignore case | When case doesn't matter |
Practical Examples
First, create a file with unsorted duplicate entries for uniq examples:
Create a file with duplicates that are not adjacent:
cat > unsorted.txt << 'EOF'
apple
banana
apple
orange
banana
apple
cherry
EOF
Basic Usage
Important: Understanding uniq without sorting first
Try using uniq directly on the unsorted file (this won't work correctly):
# This won't work correctly because duplicates aren't adjacent
uniq unsorted.txt
# Output shows all lines because duplicates aren't next to each other:
# apple
# banana
# apple
# orange
# banana
# apple
# cherry
Now try uniq with sorting first (this works correctly):
# First sort the file, then use uniq
sort unsorted.txt | uniq
# Output shows only unique lines:
# apple
# banana
# cherry
# orange
Using uniq with the names.txt file (which has some duplicates):
# Remove duplicates (must sort first)
sort names.txt | uniq
# Output shows unique names:
# Jane
# Jill
# Jim
# John
# Count how many times each name appears
sort names.txt | uniq -c
# Output:
# 2 Jane
# 1 Jill
# 1 Jim
# 2 John
# Show only duplicate names
sort names.txt | uniq -d
# Output:
# Jane
# John
# Show only unique names (names that appear only once)
sort names.txt | uniq -u
# Output:
# Jill
# Jim
Using uniq with the duplicates.txt file (which has duplicates):
# Remove duplicates from duplicates.txt
sort duplicates.txt | uniq
# Count how many times each item appears
sort duplicates.txt | uniq -c
# Output:
# 2 apple
# 2 banana
# 1 cherry
# 1 orange
# Show only duplicate items
sort duplicates.txt | uniq -d
# Output:
# apple
# banana
# Show only unique items (items that appear only once)
sort duplicates.txt | uniq -u
# Output:
# cherry
# orange
Key Differences:
- Without sorting:
uniqonly removes duplicates that are adjacent (next to each other). If duplicates are scattered throughout the file, they won't be removed. - With sorting: Sorting first groups all duplicates together, so
uniqcan properly identify and remove them. - Alternative: You can use
sort -uinstead ofsort | uniqfor the same result.
Tips for Success
- Always sort first: Remember that
uniqonly works on sorted data - Use -n for numbers: Always use
-nwhen sorting numbers - Check your data: Look at your data before sorting to choose the right options
- Combine commands: Use pipelines to combine
sortanduniqwith other commands
Common Mistakes to Avoid
- Using
uniqwithout sorting first - Forgetting
-nwhen sorting numbers - Not using
-bwhen data has extra spaces - Using wrong column numbers with
-k
Best Practices
- Always sort before using
uniq - Use descriptive filenames for sorted output
- Use
-nfor any numerical sorting - Combine options when needed (e.g.,
-bn) - Use pipelines to create powerful data processing chains