
The sort and uniq Commands
Think of sort
and uniq
like organizing your music playlist. sort
is like arranging your songs in order (by artist, title, or length), while uniq
is like removing duplicate songs. Together, they help you organize and clean up your data, just like organizing your music collection!
Quick Reference
Command | What It Does | Common Use |
---|---|---|
sort |
Arranges lines in order | Organizing lists, sorting data |
uniq |
Removes duplicate lines | Finding unique items, counting duplicates |
When to Use These Commands
Use sort
and uniq
when you want to:
- Organize data alphabetically or numerically
- Remove duplicate entries from a list
- Count how many times items appear
- Find unique or duplicate items
- Clean up messy data
sort Command
The sort
command is like a librarian organizing books. It can arrange your data in different ways, just like books can be sorted by title, author, or subject.
Common Options
Option | What It Does | When to Use It |
---|---|---|
-b |
Ignore spaces at start | When data has extra spaces |
-n |
Sort numbers correctly | When sorting numerical data |
-r |
Reverse the order | When you want descending order |
-f |
Ignore case | When case doesn't matter |
-k |
Sort by column | When data has multiple columns |
-M |
Sort by month | When sorting dates |
-u |
Remove duplicate lines | When you want unique lines only |
Practical Examples
Basic Sorting
# Sort names alphabetically
sort names.txt
# Sort numbers correctly
sort -n numbers.txt
# Sort in reverse order
sort -r items.txt
# Sort ignoring case
sort -f mixed_case.txt
# Sort and remove duplicates
sort -u duplicates.txt
Advanced Sorting
# Sort by second column
sort -k2 data.txt
# Sort numbers and ignore spaces
sort -bn numbers.txt
# Sort by month names
sort -M dates.txt
uniq Command
The uniq
command is like a librarian checking for duplicate books on a shelf. Just like a librarian can only spot duplicate books when they're next to each other on the shelf, uniq
can only find duplicates when they're next to each other in the file. That's why we always sort the data first - it's like organizing all the books in order so the librarian can easily spot and remove the duplicates!
Common Options
Option | What It Does | When to Use It |
---|---|---|
-c |
Count occurrences | When you want to count duplicates |
-d |
Show only duplicates | When you want to find repeated items |
-u |
Show only unique items | When you want to find unique items |
-i |
Ignore case | When case doesn't matter |
Practical Examples
Basic Usage
# Remove duplicates (must sort first)
sort names.txt | uniq
# Count how many times each name appears
sort names.txt | uniq -c
# Show only duplicate names
sort names.txt | uniq -d
# Show only unique names
sort names.txt | uniq -u
Why Sorting is Important
# Example file (unsorted.txt):
# apple
# banana
# apple
# orange
# banana
# apple
# This won't work correctly because duplicates aren't adjacent:
uniq unsorted.txt
# Output:
# apple
# banana
# apple
# orange
# banana
# apple
# First sort the file, then use uniq:
sort unsorted.txt | uniq
# Output:
# apple
# banana
# orange
# Or use sort -u for the same result:
sort -u unsorted.txt
# Output:
# apple
# banana
# orange
Real-World Examples
# Find most common words in a file
tr ' ' '\n' < document.txt | sort | uniq -c | sort -nr
# Find unique IP addresses in a log
cut -d' ' -f1 access.log | sort | uniq
# Count duplicate lines in a CSV
sort data.csv | uniq -c | sort -nr
Tips for Success
- Always sort first: Remember that
uniq
only works on sorted data - Use -n for numbers: Always use
-n
when sorting numbers - Check your data: Look at your data before sorting to choose the right options
- Combine commands: Use pipelines to combine
sort
anduniq
with other commands
Common Mistakes to Avoid
- Using
uniq
without sorting first - Forgetting
-n
when sorting numbers - Not using
-b
when data has extra spaces - Using wrong column numbers with
-k
Best Practices
- Always sort before using
uniq
- Use descriptive filenames for sorted output
- Use
-n
for any numerical sorting - Combine options when needed (e.g.,
-bn
) - Use pipelines to create powerful data processing chains
Advanced Techniques
Complex Sorting
# Sort by multiple columns
sort -k2,2 -k1,1 data.txt
# Sort numbers and ignore case
sort -f -n mixed_data.txt
# Sort by month and then by day
sort -k3M -k2n dates.txt
Data Analysis
# Find top 10 most common words
tr ' ' '\n' < text.txt | sort | uniq -c | sort -nr | head -10
# Analyze log file entries
cut -d' ' -f1,4 access.log | sort | uniq -c | sort -nr
# Process CSV data
cut -d',' -f2 data.csv | sort | uniq -c | sort -nr