CIS120 Linux Fundamentals by Scott Shaper

The sort and uniq Commands

Think of sort and uniq like organizing your music playlist. sort is like arranging your songs in order (by artist, title, or length), while uniq is like removing duplicate songs. Together, they help you organize and clean up your data, just like organizing your music collection!

The examples below use the course setup. Work in ~/playground/chapter3, where the setup has created names.txt, numbers.txt, items.txt, mixed_case.txt, data.txt, numbers_spaces.txt, dates.txt, complex_dates.txt and duplicates.txt. Use cd ~/playground/chapter3 before trying the commands.

Quick Reference

Command What It Does Common Use
sort Arranges lines in order Organizing lists, sorting data
uniq Removes duplicate lines Finding unique items, counting duplicates

When to Use These Commands

Use sort and uniq when you want to:

sort Command

The sort command is like a librarian organizing books. It can arrange your data in different ways, just like books can be sorted by title, author, or subject.

Common Options

Option What It Does When to Use It
-b Ignore spaces at start When data has extra spaces
-n Sort numbers correctly When sorting numerical data
-r Reverse the order When you want descending order
-f Ignore case When case doesn't matter
-k Sort by column When data has multiple columns
-M Sort by month When sorting dates
-u Remove duplicate lines When you want unique lines only

Practical Examples

From ~/playground/chapter3, the setup has created the practice files. Run the following commands there.

Basic Sorting
cd ~/playground/chapter3

# Sort names alphabetically
sort names.txt

# Sort numbers correctly
sort -n numbers.txt

# Sort in reverse order
sort -r items.txt

# Sort ignoring case
sort -f mixed_case.txt

# Sort and remove duplicates
sort -u duplicates.txt

Advanced sorting uses data.txt, numbers_spaces.txt, and dates.txt from the setup.

Advanced Sorting

In this example the -k2 means sort starting at field 2. The fields are separated by tabs and are considered columns.

# Sort by second column
sort -k2 data.txt

# Sort numbers and ignore spaces
sort -bn numbers_spaces.txt

# Sort by month names
sort -M dates.txt

If you have a file like the complex_dates.txt file, and your are asked to sort it in alphabetical order it will be a bit more complicated because you have to deal with the month names and the days of the month. To do that you need to sort each column.

As explained above, -k, the numbers are field (column) positions. -k2 means "sort starting at field 2". -k1,2M means "use fields 1 through 2 as the key" and treat month names correctly because of M. In -k2,2n, the key is only field 2 and n makes it numeric.

# Sort by month names and then by days of the month
sort -k1,2M -k2,2n complex_dates.txt

#To reverse the sort order you would do
sort -k1,2Mr -k2,2nr

uniq Command

The uniq command is like a librarian checking for duplicate books on a shelf. Just like a librarian can only spot duplicate books when they're next to each other on the shelf, uniq can only find duplicates when they're next to each other in the file. That's why we always sort the data first - it's like organizing all the books in order so the librarian can easily spot and remove the duplicates!

Common Options

Option What It Does When to Use It
-c Count occurrences When you want to count duplicates
-d Show only duplicates When you want to find repeated items
-u Show only unique items When you want to find unique items
-i Ignore case When case doesn't matter

Practical Examples

The setup has created duplicates.txt in ~/playground/chapter3: a file with duplicate lines that are not adjacent. Use it to see why sorting before uniq matters.

Basic Usage

Important: Understanding uniq without sorting first

Try using uniq directly on the unsorted file (this won't work correctly):

cd ~/playground/chapter3

# This won't work correctly because duplicates aren't adjacent
uniq duplicates.txt
# Output shows all lines because duplicates aren't next to each other:
# apple
# banana
# apple
# orange
# banana
# apple
# cherry

Now try uniq with sorting first (this works correctly):

# First sort the file, then use uniq
sort duplicates.txt | uniq
# Output shows only unique lines:
# apple
# banana
# cherry
# orange

Using uniq with the names.txt file (which has some duplicates):

# Remove duplicates (must sort first)
sort names.txt | uniq
# Output shows unique names:
# Jane
# Jill
# Jim
# John

# Count how many times each name appears
sort names.txt | uniq -c
# Output:
#       2 Jane
#       1 Jill
#       1 Jim
#       2 John

# Show only duplicate names
sort names.txt | uniq -d
# Output:
# Jane
# John

# Show only unique names (names that appear only once)
sort names.txt | uniq -u
# Output:
# Jill
# Jim

Key Differences:

Tips for Success

Common Mistakes to Avoid

Best Practices