
comm, diff and patch Commands
Imagine you're working on a group project and need to see what changes your teammate made to a document. Just like using "Track Changes" in a word processor, Linux provides powerful tools to compare files and apply changes selectively. In this chapter, we'll explore three commands that will help you spot differences, understand changes, and update files efficiently.
Quick Reference
Command | Description | Common Use |
---|---|---|
comm |
Compare two sorted files line by line | Finding common lines or unique elements between files |
diff |
Show line-by-line differences between files | Comparing code files or configuration files |
patch |
Apply changes from a diff file to an original file | Updating software or applying fixes from others |
The comm Command
The comm command is a specialized comparison tool that works like a Venn diagram for text files. It takes two sorted files and produces a three-column output showing lines unique to the first file, lines unique to the second file, and lines common to both files. This makes it perfect for finding overlaps and differences in sorted data like word lists, configuration files, or any text where you need to identify shared or unique elements. Unlike diff, which focuses on showing changes, comm is designed to highlight relationships between files.
When to Use
- When you need to find common lines between two files
- When you want to identify lines unique to one file or the other
- When working with sorted data like word lists or configuration options
- When you need a simple, column-based comparison output
Common Options
Option | What It Does | When to Use It |
---|---|---|
-1 |
Suppresses column 1 (lines unique to first file) | When you only care about common lines and lines in the second file |
-2 |
Suppresses column 2 (lines unique to second file) | When you only care about common lines and lines in the first file |
-3 |
Suppresses column 3 (lines common to both files) | When you only want to see differences, not similarities |
Practical Example
Imagine you have two todo lists and want to see what tasks are on both lists:
todo1.txt
:
buy groceries
clean kitchen
finish homework
pay bills
todo2.txt
:
buy groceries
call mom
finish homework
schedule dentist
Command:
# Compare both todo lists
comm todo1.txt todo2.txt
Output:
buy groceries
call mom
clean kitchen
finish homework
pay bills
schedule dentist
This shows:
- Column 1 (no indent): Lines only in todo1.txt ("clean kitchen", "pay bills")
- Column 2 (tabbed once): Lines only in todo2.txt ("call mom", "schedule dentist")
- Column 3 (tabbed twice): Lines in both files ("buy groceries", "finish homework")
If you only want to see common tasks:
# Show only tasks that appear on both lists
comm -1 -2 todo1.txt todo2.txt
Output:
buy groceries
finish homework
The diff Command
The diff command is a powerful file comparison tool that shows exactly what changed between two files. It works like a forensic investigator, examining each line and reporting additions, deletions, and modifications. Unlike comm, which requires sorted files and shows relationships, diff focuses on showing the evolution of content over time. It's particularly useful for tracking changes in code, configuration files, or any text where you need to understand exactly what was modified. The output can be formatted in different ways to suit your needs, from simple line-by-line differences to more detailed context formats.
When to Use
- When you need detailed information about exactly what changed between files
- When working with code or configuration files
- When you want to create a patch file that can be applied later
- When you need to generate a human-readable report of differences
Common Options
Option | What It Does | When to Use It |
---|---|---|
-u |
Outputs in unified format (more readable) | For easier-to-read output when comparing code files |
-c |
Outputs in context format (shows surrounding context) | When you need to see the changes with some surrounding lines for context |
-i |
Ignores case differences | When comparing text where capitalization doesn't matter |
-w |
Ignores all whitespace | When comparing code where indentation or spacing might differ |
Practical Example
Let's say you have two versions of a small recipe:
recipe_v1.txt
:
Pancake Recipe
--------------
2 cups flour
1 tablespoon sugar
1 teaspoon salt
2 eggs
1 cup milk
recipe_v2.txt
:
Pancake Recipe
--------------
2 cups flour
2 tablespoons sugar
1 teaspoon baking powder
1 teaspoon salt
2 eggs
1 1/2 cups milk
Command (with context format):
# Compare recipes with context
diff -c recipe_v1.txt recipe_v2.txt
Output:
*** recipe_v1.txt 2024-07-11 10:00:00.000000000 +0000
--- recipe_v2.txt 2024-07-11 10:00:00.000000000 +0000
***************
*** 1,6 ****
Pancake Recipe
--------------
2 cups flour
! 1 tablespoon sugar
1 teaspoon salt
2 eggs
--- 1,7 ----
Pancake Recipe
--------------
2 cups flour
! 2 tablespoons sugar
! 1 teaspoon baking powder
1 teaspoon salt
2 eggs
***************
*** 6,7 ****
1 teaspoon salt
2 eggs
! 1 cup milk
--- 7,8 ----
1 teaspoon salt
2 eggs
! 1 1/2 cups milk
How to Read diff Output:
- Lines with
!
show lines that were changed - Lines with
+
show lines that were added - Lines with
-
show lines that were removed - The asterisks
***
show line numbers from the original file - The dashes
---
show line numbers from the new file
In this example, the recipe changed to use more sugar, add baking powder, and use more milk.
Creating a Patch File:
# Create a patch file to save these changes
diff -u recipe_v1.txt recipe_v2.txt > recipe_update.patch
This creates a file with all the changes that can be applied later with the patch command.
The patch Command
The patch command is the final piece of the file comparison puzzle, taking the changes identified by diff and applying them to files. Think of it as a precise editor that can automatically implement changes without you having to manually edit files. It reads a patch file (created by diff) and applies the specified changes to the original file, effectively updating it to match the new version. This makes it invaluable for software updates, collaborative editing, and any situation where you need to apply a set of changes consistently across multiple files or systems.
When to Use
- When you need to apply changes from a diff file to update your files
- When working with software updates distributed as patch files
- When collaborating on code and sharing changes without sending entire files
- When you want to roll back changes by applying a patch in reverse
Common Options
Option | What It Does | When to Use It |
---|---|---|
-pNUM |
Strips NUM leading components from file paths | When applying patches that have different directory structures |
-R |
Reverses the patch (undoes changes) | When you need to undo a previously applied patch |
-i |
Reads patch from a specified file | When your patch is in a file rather than from standard input |
-o |
Writes output to a specified file instead of changing original | When you want to keep the original file unchanged |
Practical Example
Let's continue with our recipe example. Imagine you received the recipe_update.patch
file and want to update your original recipe:
Command:
# Apply the recipe changes to your file
patch recipe_v1.txt -i recipe_update.patch
Output:
patching file recipe_v1.txt
Now recipe_v1.txt will have all the changes from recipe_v2.txt applied to it.
If you decide you liked the original recipe better, you can reverse the patch:
# Undo the changes by reversing the patch
patch -R recipe_v1.txt -i recipe_update.patch
Output:
patching file recipe_v1.txt
This will revert recipe_v1.txt back to its original state.
Tips for Success
- Always make backups of important files before applying patches
- The
comm
command requires input files to be sorted first (usesort file > sorted_file
) - Use
diff -u
for the most readable output format for humans - When sharing patches with others, include clear descriptions of what the patch does
- Use
diff -w
when comparing code files to ignore whitespace differences
Common Mistakes to Avoid
- Forgetting that
comm
requires sorted input files - Applying patches to the wrong file or in the wrong directory
- Not checking patch output for errors or rejected hunks
- Creating patches with absolute file paths that won't work on other systems
- Forgetting to use
-R
when trying to reverse a patch
Best Practices
- Keep a changelog when creating patches for others to use
- Use meaningful filenames for patch files that describe what they change
- Test patches in a non-production environment before applying them to critical systems
- Use
diff -u
ordiff -c
when creating patches to include context - When collaborating, use a version control system like Git instead of manually creating patches