WCC logo

CIS120Linux Fundementals

The sort and uniq Commands

The sort and uniq commands in Linux are essential for organizing and processing text files. They are often used together to sort and remove duplicates from lists of data. Understanding these commands and their options can significantly improve your efficiency in managing text data.

The sort Command

The sort command sorts lines of text files. It arranges the lines in a specified order, either alphabetically or numerically. The basic syntax for the sort command is:

sort [OPTION]... [FILE]...

Common Options for sort

Option Description
-b Ignore leading blanks
-d Consider only blanks and alphanumeric characters
-f Fold lowercase to uppercase characters
-g General numeric sort
-i Consider only printable characters
-M Sort by month name
-n Numeric sort
-r Reverse the result of comparisons
-k Sort via a key

To sort the contents of a file alphabetically, use:

sort filename.txt

This command sorts the lines in filename.txt alphabetically. For a numeric sort, where lines are sorted based on numerical values, use:

sort -n filename.txt

If you want to sort the lines in reverse order, the -r option can be used:

sort -r filename.txt

Combining options allows for more specific sorting, such as ignoring leading blanks and sorting numerically:

sort -bn filename.txt

The uniq Command

The uniq command filters out repeated lines in a file, displaying only unique lines. It is often used after the sort command because uniq works only on adjacent duplicate lines. The basic syntax for the uniq command is:

uniq [OPTION]... [FILE]...

Common Options for uniq

Option Description
-c Prefix lines by the number of occurrences
-d Only print duplicate lines
-u Only print unique lines
-i Ignore differences in case when comparing
-f Skip fields before comparing
-s Skip characters before comparing

To remove duplicate lines from a file, first sort the file and then use uniq:

sort filename.txt | uniq

This command sorts the lines in filename.txt and then filters out duplicate lines. To count the occurrences of each line, use the -c option:

sort filename.txt | uniq -c

This will prefix each line with the number of times it appears in the file. If you want to display only the duplicate lines, use the -d option:

sort filename.txt | uniq -d

For displaying only unique lines, use the -u option:

sort filename.txt | uniq -u

Examples

Sorting a file alphabetically:

sort fruits.txt

Sorting a file numerically:

sort -n numbers.txt

Sorting a file and ignoring leading blanks:

sort -b names.txt

Sorting a file in reverse order:

sort -r items.txt

Removing duplicate lines from a sorted file:

sort animals.txt | uniq

Counting the occurrences of each line:

sort animals.txt | uniq -c

Displaying only duplicate lines:

sort colors.txt | uniq -d

Displaying only unique lines:

sort colors.txt | uniq -u

Summary

The sort and uniq commands are powerful tools for organizing and processing text data in Linux. sort arranges lines in a specified order, while uniq filters out repeated lines. Mastering these commands and their options will enable you to efficiently manage and manipulate text files in your Linux environment.