CIS120Linux Fundementals
The sort and uniq Commands
The sort
and uniq
commands in Linux are essential for organizing and processing text files. They are often used together to sort and remove duplicates from lists of data. Understanding these commands and their options can significantly improve your efficiency in managing text data.
The sort
Command
The sort
command sorts lines of text files. It arranges the lines in a specified order, either alphabetically or numerically. The basic syntax for the sort
command is:
sort [OPTION]... [FILE]...
Common Options for sort
Option | Description |
---|---|
-b |
Ignore leading blanks |
-d |
Consider only blanks and alphanumeric characters |
-f |
Fold lowercase to uppercase characters |
-g |
General numeric sort |
-i |
Consider only printable characters |
-M |
Sort by month name |
-n |
Numeric sort |
-r |
Reverse the result of comparisons |
-k |
Sort via a key |
To sort the contents of a file alphabetically, use:
sort filename.txt
This command sorts the lines in filename.txt
alphabetically. For a numeric sort, where lines are sorted based on numerical values, use:
sort -n filename.txt
If you want to sort the lines in reverse order, the -r
option can be used:
sort -r filename.txt
Combining options allows for more specific sorting, such as ignoring leading blanks and sorting numerically:
sort -bn filename.txt
The uniq
Command
The uniq
command filters out repeated lines in a file, displaying only unique lines. It is often used after the sort
command because uniq
works only on adjacent duplicate lines. The basic syntax for the uniq
command is:
uniq [OPTION]... [FILE]...
Common Options for uniq
Option | Description |
---|---|
-c |
Prefix lines by the number of occurrences |
-d |
Only print duplicate lines |
-u |
Only print unique lines |
-i |
Ignore differences in case when comparing |
-f |
Skip fields before comparing |
-s |
Skip characters before comparing |
To remove duplicate lines from a file, first sort the file and then use uniq
:
sort filename.txt | uniq
This command sorts the lines in filename.txt
and then filters out duplicate lines. To count the occurrences of each line, use the -c
option:
sort filename.txt | uniq -c
This will prefix each line with the number of times it appears in the file. If you want to display only the duplicate lines, use the -d
option:
sort filename.txt | uniq -d
For displaying only unique lines, use the -u
option:
sort filename.txt | uniq -u
Examples
Sorting a file alphabetically:
sort fruits.txt
Sorting a file numerically:
sort -n numbers.txt
Sorting a file and ignoring leading blanks:
sort -b names.txt
Sorting a file in reverse order:
sort -r items.txt
Removing duplicate lines from a sorted file:
sort animals.txt | uniq
Counting the occurrences of each line:
sort animals.txt | uniq -c
Displaying only duplicate lines:
sort colors.txt | uniq -d
Displaying only unique lines:
sort colors.txt | uniq -u
Summary
The sort
and uniq
commands are powerful tools for organizing and processing text data in Linux. sort
arranges lines in a specified order, while uniq
filters out repeated lines. Mastering these commands and their options will enable you to efficiently manage and manipulate text files in your Linux environment.