CIS120 Linux Fundamentals by Scott Shaper

cut, paste and join Commands

Imagine you're putting together a scrapbook. You need to cut out specific photos (cut), arrange them side by side on a page (paste), and then match up related pictures from different albums (join). Linux offers three powerful commands that work just like these scrapbooking tools, but for text files. Whether you're extracting columns from a CSV file, combining data side by side, or matching related information, these commands make text manipulation simple and efficient.

Quick Reference

Command Description Common Use
cut Extract sections from each line of input Getting specific columns from CSV files or formatted data
paste Merge lines from files side by side Combining data horizontally (like adding columns)
join Combine lines from two files based on a common field Merging related data (similar to database joins)

The cut Command

The cut command acts like a data surgeon, precisely extracting specific portions from each line of text. It can select data by column (using delimiters like commas or tabs) or by character position. Unlike many text processing tools, cut doesn't alter the content it extracts - it simply takes a "slice" of each line according to your specifications. This makes it perfect for extracting specific fields from CSV files, isolating columns from tables, or grabbing specific character positions from formatted output.

When to Use

  • When you need to extract specific columns from CSV or tab-delimited files
  • When you want to pull out specific characters from each line of text
  • When processing log files to extract timestamps or specific data fields
  • When you need to remove sensitive information from files before sharing

Common Options

Option What It Does When to Use It
-f LIST Selects specific fields (columns) separated by a delimiter When working with structured data like CSV files
-d DELIM Specifies the delimiter between fields (default is tab) When your data uses commas, colons, or other separators
-c LIST Selects specific character positions When working with fixed-width files or need specific characters
-b LIST Selects specific bytes When working with binary files or multi-byte characters
--complement Selects the inverse of the specified fields When you want to exclude certain columns instead of including them

Practical Examples

Imagine you have a CSV file with student information:

students.csv:

ID,First Name,Last Name,Major,GPA
101,John,Doe,Computer Science,3.8
102,Jane,Smith,Biology,3.9
103,Mike,Johnson,Engineering,3.5

To extract just the name and major (columns 2, 3, and 4):

# Get student names and majors
cut -d ',' -f 2,3,4 students.csv

Output:

First Name,Last Name,Major
John,Doe,Computer Science
Jane,Smith,Biology
Mike,Johnson,Engineering
ID,First Name,Last Name,Major
    101,John,Doe,Computer Science
    102,Jane,Smith,Biology
    103,Mike,Johnson,Engineering

Important Note: The -d option works well with single-character delimiters like commas or tabs, but it doesn't handle multiple spaces well. For example, with ls -l output, you should use -c instead:

Example with ls -l:

# This won't work well - ls -l uses multiple spaces
    ls -l | cut -d ' ' -f 1,9
    
    # This works better - extract specific character positions
    ls -l | cut -c 1-10,56-

The reason is that ls -l uses variable-width spacing (multiple spaces) to align columns, not a consistent single-character delimiter. When you try to use -d ' ', cut treats each space as a separate delimiter, which breaks the field counting. Even if you try to use tabs with -d $'\t', it won't work because ls -l doesn't use tab characters at all - it only uses spaces for alignment. Unlike a file with a consistent delimiter, ls -l doesn't have a single character that separates the fields.

To extract just the first 10 characters of each line:

# Get the first 10 characters of each line
cut -c 1-10 students.csv

Output:

ID,First N
101,John,D
102,Jane,S
103,Mike,J

To exclude the GPA column using complement:

# Get all information except GPA
cut -d ',' -f 1-4 students.csv

Output:

ID,First Name,Last Name,Major
101,John,Doe,Computer Science
102,Jane,Smith,Biology
103,Mike,Johnson,Engineering

The paste Command

The paste command functions like a digital assembler, bringing together lines from multiple files side by side. While cut slices data vertically (by column), paste combines data horizontally, merging corresponding lines from different files with a delimiter between them. This command excels at creating tabular data from separate sources and converting data between row and column formats. It's particularly useful when you need to combine related information stored in separate files without complex processing.

When to Use

  • When you need to combine multiple files side by side
  • When building CSV files by combining columns from different sources
  • When converting data from rows to columns or vice versa
  • When you want to create a simple table from separate lists

Common Options

Option What It Does When to Use It
-d LIST Specifies delimiters to use between merged lines When you need a specific separator between columns
-s Merges lines from one file at a time (serial) When converting rows to columns within a single file

Practical Examples

Imagine you have two separate files with related information:

names.txt:

John
Jane
Mike

scores.txt:

85
92
78

To combine names and scores side by side with a tab delimiter (default):

# Combine names and scores with tab separator
paste names.txt scores.txt

Output:

John    85
Jane    92
Mike    78

To create a CSV file by combining the files with a comma:

# Create a CSV from the two files
paste -d ',' names.txt scores.txt

Output:

John,85
Jane,92
Mike,78

Working with a single file to convert rows to columns:

shopping.txt:

Apples
Bananas
Milk
Bread
Eggs
Cheese

To display items in two columns:

# Convert shopping list to two columns
paste - - < shopping.txt

Output:

Apples    Bananas
Milk      Bread
Eggs      Cheese

Or using the serial option:

# Using serial paste for the same effect
paste -s -d '\t\n' shopping.txt

The join Command

The join command works much like a database JOIN operation, intelligently combining lines from two files based on a common field or "key". Unlike paste, which blindly combines lines by position, join matches lines that share the same value in a specified field. This makes it ideal for relating information across multiple data sources, similar to how you might combine tables in a database. Join requires sorted input files and provides various options to handle different join types (inner, outer) and field specifications.

When to Use

  • When combining data from two files based on a common identifier (like a database join)
  • When merging configuration files based on matching keys
  • When you need to relate information from separate sources
  • When performing data analysis across multiple data files

Common Options

Option What It Does When to Use It
-1 FIELD Specifies the join field from the first file When the common key is not in the first column
-2 FIELD Specifies the join field from the second file When the common key is not in the first column
-t CHAR Sets the field separator character When fields are separated by something other than whitespace
-a FILENUM Prints unpairable lines from specified file (1 or 2) When you need an outer join (keep unmatched records)
-o FORMAT Specifies the output format When you need to control exactly which fields appear in output

Practical Examples

Imagine you have two files with related student information:

students.txt (sorted by ID):

101 John Doe
102 Jane Smith
103 Mike Johnson

grades.txt (sorted by ID):

101 A Biology
102 B Chemistry
103 A Computer_Science

To join the files based on the student ID (first field):

# Join student info with their grades
join students.txt grades.txt

Output:

101 John Doe A Biology
102 Jane Smith B Chemistry
103 Mike Johnson A Computer_Science

Now imagine you have files with different separators:

students_csv.txt:

101,John,Doe
102,Jane,Smith
103,Mike,Johnson

grades_csv.txt:

101,A,Biology
102,B,Chemistry
103,A,Computer_Science

To join CSV files (using comma as separator):

# Join CSV files
join -t ',' students_csv.txt grades_csv.txt

Output:

101,John,Doe,A,Biology
102,Jane,Smith,B,Chemistry
103,Mike,Johnson,A,Computer_Science

To perform an outer join (keep records from file 1 even if they don't match):

# Outer join to keep all student records
join -a 1 students.txt grades.txt

Tips for Success

  • When using join, both files must be sorted on the join field
  • Use cut with pipes to extract specific fields from command output
  • For paste, ensure files have matching line counts or use it with caution
  • Create temporary header files to make your data clearer when working with columns
  • Test complex commands with smaller data samples before running on large files

Common Mistakes to Avoid

  • Using join on unsorted files (results will be incorrect)
  • Forgetting to specify the delimiter with -d for cut and paste
  • Using character positions (-c) when fields would be more appropriate
  • Overwriting original files without making backups
  • Confusing paste and join (paste is side-by-side, join is by matching fields)

Best Practices

  • Use head to preview the structure of your data files before processing
  • Pipe output to less or redirect to a new file rather than overwriting originals
  • Add comments to your scripts explaining complex text processing operations
  • Combine these commands with sort, uniq, and grep for powerful data processing