CIS120 Linux Fundamentals by Scott Shaper

comm, diff and patch Commands

Imagine you're working on a group project and need to see what changes your teammate made to a document. Just like using "Track Changes" in a word processor, Linux provides powerful tools to compare files and apply changes selectively. In this chapter, we'll explore three commands that will help you spot differences, understand changes, and update files efficiently.

Quick Reference

Command Description Common Use
comm Compare two sorted files line by line Finding common lines or unique elements between files
diff Show line-by-line differences between files Comparing code files or configuration files
patch Apply changes from a diff file to an original file Updating software or applying fixes from others

The comm Command

The comm command is a specialized comparison tool that works like a Venn diagram for text files. It takes two sorted files and produces a three-column output showing lines unique to the first file, lines unique to the second file, and lines common to both files. This makes it perfect for finding overlaps and differences in sorted data like word lists, configuration files, or any text where you need to identify shared or unique elements. Unlike diff, which focuses on showing changes, comm is designed to highlight relationships between files.

When to Use

  • When you need to find common lines between two files
  • When you want to identify lines unique to one file or the other
  • When working with sorted data like word lists or configuration options
  • When you need a simple, column-based comparison output

Common Options

Option What It Does When to Use It
-1 Suppresses column 1 (lines unique to first file) When you only care about common lines and lines in the second file
-2 Suppresses column 2 (lines unique to second file) When you only care about common lines and lines in the first file
-3 Suppresses column 3 (lines common to both files) When you only want to see differences, not similarities

Practical Example

Imagine you have two todo lists and want to see what tasks are on both lists:

todo1.txt:

buy groceries
clean kitchen
finish homework
pay bills

todo2.txt:

buy groceries
call mom
finish homework
schedule dentist

Command:

# Compare both todo lists
comm todo1.txt todo2.txt

Output:

        buy groceries
call mom
clean kitchen
        finish homework
pay bills
        schedule dentist

This shows:

  • Column 1 (no indent): Lines only in todo1.txt ("clean kitchen", "pay bills")
  • Column 2 (tabbed once): Lines only in todo2.txt ("call mom", "schedule dentist")
  • Column 3 (tabbed twice): Lines in both files ("buy groceries", "finish homework")

If you only want to see common tasks:

# Show only tasks that appear on both lists
comm -1 -2 todo1.txt todo2.txt

Output:

        buy groceries
        finish homework

The diff Command

The diff command is a powerful file comparison tool that shows exactly what changed between two files. It works like a forensic investigator, examining each line and reporting additions, deletions, and modifications. Unlike comm, which requires sorted files and shows relationships, diff focuses on showing the evolution of content over time. It's particularly useful for tracking changes in code, configuration files, or any text where you need to understand exactly what was modified. The output can be formatted in different ways to suit your needs, from simple line-by-line differences to more detailed context formats.

When to Use

  • When you need detailed information about exactly what changed between files
  • When working with code or configuration files
  • When you want to create a patch file that can be applied later
  • When you need to generate a human-readable report of differences

Common Options

Option What It Does When to Use It
-u Outputs in unified format (more readable) For easier-to-read output when comparing code files
-c Outputs in context format (shows surrounding context) When you need to see the changes with some surrounding lines for context
-i Ignores case differences When comparing text where capitalization doesn't matter
-w Ignores all whitespace When comparing code where indentation or spacing might differ

Practical Example

Let's say you have two versions of a small recipe:

recipe_v1.txt:

Pancake Recipe
--------------
2 cups flour
1 tablespoon sugar
1 teaspoon salt
2 eggs
1 cup milk

recipe_v2.txt:

Pancake Recipe
--------------
2 cups flour
2 tablespoons sugar
1 teaspoon baking powder
1 teaspoon salt
2 eggs
1 1/2 cups milk

Command (with context format):

# Compare recipes with context
diff -c recipe_v1.txt recipe_v2.txt

Output:

*** recipe_v1.txt 2024-07-11 10:00:00.000000000 +0000
--- recipe_v2.txt 2024-07-11 10:00:00.000000000 +0000
***************
*** 1,6 ****
  Pancake Recipe
  --------------
  2 cups flour
! 1 tablespoon sugar
  1 teaspoon salt
  2 eggs
--- 1,7 ----
  Pancake Recipe
  --------------
  2 cups flour
! 2 tablespoons sugar
! 1 teaspoon baking powder
  1 teaspoon salt
  2 eggs
***************
*** 6,7 ****
  1 teaspoon salt
  2 eggs
! 1 cup milk
--- 7,8 ----
  1 teaspoon salt
  2 eggs
! 1 1/2 cups milk
How to Read diff Output:
  • Lines with ! show lines that were changed
  • Lines with + show lines that were added
  • Lines with - show lines that were removed
  • The asterisks *** show line numbers from the original file
  • The dashes --- show line numbers from the new file

In this example, the recipe changed to use more sugar, add baking powder, and use more milk.

Creating a Patch File:
# Create a patch file to save these changes
diff -u recipe_v1.txt recipe_v2.txt > recipe_update.patch

This creates a file with all the changes that can be applied later with the patch command.

The patch Command

The patch command is the final piece of the file comparison puzzle, taking the changes identified by diff and applying them to files. Think of it as a precise editor that can automatically implement changes without you having to manually edit files. It reads a patch file (created by diff) and applies the specified changes to the original file, effectively updating it to match the new version. This makes it invaluable for software updates, collaborative editing, and any situation where you need to apply a set of changes consistently across multiple files or systems.

When to Use

  • When you need to apply changes from a diff file to update your files
  • When working with software updates distributed as patch files
  • When collaborating on code and sharing changes without sending entire files
  • When you want to roll back changes by applying a patch in reverse

Common Options

Option What It Does When to Use It
-pNUM Strips NUM leading components from file paths When applying patches that have different directory structures
-R Reverses the patch (undoes changes) When you need to undo a previously applied patch
-i Reads patch from a specified file When your patch is in a file rather than from standard input
-o Writes output to a specified file instead of changing original When you want to keep the original file unchanged

Practical Example

Let's continue with our recipe example. Imagine you received the recipe_update.patch file and want to update your original recipe:

Command:

# Apply the recipe changes to your file
patch recipe_v1.txt -i recipe_update.patch

Output:

patching file recipe_v1.txt

Now recipe_v1.txt will have all the changes from recipe_v2.txt applied to it.

If you decide you liked the original recipe better, you can reverse the patch:

# Undo the changes by reversing the patch
patch -R recipe_v1.txt -i recipe_update.patch

Output:

patching file recipe_v1.txt

This will revert recipe_v1.txt back to its original state.

Tips for Success

  • Always make backups of important files before applying patches
  • The comm command requires input files to be sorted first (use sort file > sorted_file)
  • Use diff -u for the most readable output format for humans
  • When sharing patches with others, include clear descriptions of what the patch does
  • Use diff -w when comparing code files to ignore whitespace differences

Common Mistakes to Avoid

  • Forgetting that comm requires sorted input files
  • Applying patches to the wrong file or in the wrong directory
  • Not checking patch output for errors or rejected hunks
  • Creating patches with absolute file paths that won't work on other systems
  • Forgetting to use -R when trying to reverse a patch

Best Practices

  • Keep a changelog when creating patches for others to use
  • Use meaningful filenames for patch files that describe what they change
  • Test patches in a non-production environment before applying them to critical systems
  • Use diff -u or diff -c when creating patches to include context
  • When collaborating, use a version control system like Git instead of manually creating patches