CIS120 Book

CIS120 Linux Fundamentals by Scott Shaper

The sort and uniq Commands

Think of sort and uniq like organizing your music playlist. sort is like arranging your songs in order (by artist, title, or length), while uniq is like removing duplicate songs. Together, they help you organize and clean up your data, just like organizing your music collection!

Quick Reference

Command	What It Does	Common Use
`sort`	Arranges lines in order	Organizing lists, sorting data
`uniq`	Removes duplicate lines	Finding unique items, counting duplicates

When to Use These Commands

Use sort and uniq when you want to:

Organize data alphabetically or numerically
Remove duplicate entries from a list
Count how many times items appear
Find unique or duplicate items
Clean up messy data

sort Command

The sort command is like a librarian organizing books. It can arrange your data in different ways, just like books can be sorted by title, author, or subject.

Common Options

Option	What It Does	When to Use It
`-b`	Ignore spaces at start	When data has extra spaces
`-n`	Sort numbers correctly	When sorting numerical data
`-r`	Reverse the order	When you want descending order
`-f`	Ignore case	When case doesn't matter
`-k`	Sort by column	When data has multiple columns
`-M`	Sort by month	When sorting dates
`-u`	Remove duplicate lines	When you want unique lines only

Practical Examples

Basic Sorting

# Sort names alphabetically
sort names.txt

# Sort numbers correctly
sort -n numbers.txt

# Sort in reverse order
sort -r items.txt

# Sort ignoring case
sort -f mixed_case.txt

# Sort and remove duplicates
sort -u duplicates.txt

Advanced Sorting

# Sort by second column
sort -k2 data.txt

# Sort numbers and ignore spaces
sort -bn numbers.txt

# Sort by month names
sort -M dates.txt

uniq Command

The uniq command is like a librarian checking for duplicate books on a shelf. Just like a librarian can only spot duplicate books when they're next to each other on the shelf, uniq can only find duplicates when they're next to each other in the file. That's why we always sort the data first - it's like organizing all the books in order so the librarian can easily spot and remove the duplicates!

Common Options

Option	What It Does	When to Use It
`-c`	Count occurrences	When you want to count duplicates
`-d`	Show only duplicates	When you want to find repeated items
`-u`	Show only unique items	When you want to find unique items
`-i`	Ignore case	When case doesn't matter

Practical Examples

Basic Usage

# Remove duplicates (must sort first)
sort names.txt | uniq

# Count how many times each name appears
sort names.txt | uniq -c

# Show only duplicate names
sort names.txt | uniq -d

# Show only unique names
sort names.txt | uniq -u

Why Sorting is Important

# Example file (unsorted.txt):
# apple
# banana
# apple
# orange
# banana
# apple

# This won't work correctly because duplicates aren't adjacent:
uniq unsorted.txt
# Output:
# apple
# banana
# apple
# orange
# banana
# apple

# First sort the file, then use uniq:
sort unsorted.txt | uniq
# Output:
# apple
# banana
# orange

# Or use sort -u for the same result:
sort -u unsorted.txt
# Output:
# apple
# banana
# orange

Real-World Examples

# Find most common words in a file
tr ' ' '\n' < document.txt | sort | uniq -c | sort -nr

# Find unique IP addresses in a log
cut -d' ' -f1 access.log | sort | uniq

# Count duplicate lines in a CSV
sort data.csv | uniq -c | sort -nr

Tips for Success

Always sort first: Remember that uniq only works on sorted data
Use -n for numbers: Always use -n when sorting numbers
Check your data: Look at your data before sorting to choose the right options
Combine commands: Use pipelines to combine sort and uniq with other commands

Common Mistakes to Avoid

Using uniq without sorting first
Forgetting -n when sorting numbers
Not using -b when data has extra spaces
Using wrong column numbers with -k

Best Practices

Always sort before using uniq
Use descriptive filenames for sorted output
Use -n for any numerical sorting
Combine options when needed (e.g., -bn)
Use pipelines to create powerful data processing chains

Advanced Techniques

Complex Sorting

# Sort by multiple columns
sort -k2,2 -k1,1 data.txt

# Sort numbers and ignore case
sort -f -n mixed_data.txt

# Sort by month and then by day
sort -k3M -k2n dates.txt

Data Analysis

# Find top 10 most common words
tr ' ' '\n' < text.txt | sort | uniq -c | sort -nr | head -10

# Analyze log file entries
cut -d' ' -f1,4 access.log | sort | uniq -c | sort -nr

# Process CSV data
cut -d',' -f2 data.csv | sort | uniq -c | sort -nr