
cut, paste and join Commands
Imagine you're putting together a scrapbook. You need to cut out specific photos (cut), arrange them side by side on a page (paste), and then match up related pictures from different albums (join). Linux offers three powerful commands that work just like these scrapbooking tools, but for text files. Whether you're extracting columns from a CSV file, combining data side by side, or matching related information, these commands make text manipulation simple and efficient.
Quick Reference
Command | Description | Common Use |
---|---|---|
cut |
Extract sections from each line of input | Getting specific columns from CSV files or formatted data |
paste |
Merge lines from files side by side | Combining data horizontally (like adding columns) |
join |
Combine lines from two files based on a common field | Merging related data (similar to database joins) |
The cut Command
The cut command acts like a data surgeon, precisely extracting specific portions from each line of text. It can select data by column (using delimiters like commas or tabs) or by character position. Unlike many text processing tools, cut doesn't alter the content it extracts - it simply takes a "slice" of each line according to your specifications. This makes it perfect for extracting specific fields from CSV files, isolating columns from tables, or grabbing specific character positions from formatted output.
When to Use
- When you need to extract specific columns from CSV or tab-delimited files
- When you want to pull out specific characters from each line of text
- When processing log files to extract timestamps or specific data fields
- When you need to remove sensitive information from files before sharing
Common Options
Option | What It Does | When to Use It |
---|---|---|
-f LIST |
Selects specific fields (columns) separated by a delimiter | When working with structured data like CSV files |
-d DELIM |
Specifies the delimiter between fields (default is tab) | When your data uses commas, colons, or other separators |
-c LIST |
Selects specific character positions | When working with fixed-width files or need specific characters |
-b LIST |
Selects specific bytes | When working with binary files or multi-byte characters |
--complement |
Selects the inverse of the specified fields | When you want to exclude certain columns instead of including them |
Practical Examples
Imagine you have a CSV file with student information:
students.csv
:
ID,First Name,Last Name,Major,GPA
101,John,Doe,Computer Science,3.8
102,Jane,Smith,Biology,3.9
103,Mike,Johnson,Engineering,3.5
To extract just the name and major (columns 2, 3, and 4):
# Get student names and majors
cut -d ',' -f 2,3,4 students.csv
Output:
First Name,Last Name,Major
John,Doe,Computer Science
Jane,Smith,Biology
Mike,Johnson,Engineering
ID,First Name,Last Name,Major
101,John,Doe,Computer Science
102,Jane,Smith,Biology
103,Mike,Johnson,Engineering
Important Note: The -d
option works well with single-character delimiters like commas or tabs, but it doesn't handle multiple spaces well. For example, with ls -l
output, you should use -c
instead:
Example with ls -l
:
# This won't work well - ls -l uses multiple spaces
ls -l | cut -d ' ' -f 1,9
# This works better - extract specific character positions
ls -l | cut -c 1-10,56-
The reason is that ls -l
uses variable-width spacing (multiple spaces) to align columns, not a consistent single-character delimiter. When you try to use -d ' '
, cut
treats each space as a separate delimiter, which breaks the field counting. Even if you try to use tabs with -d $'\t'
, it won't work because ls -l
doesn't use tab characters at all - it only uses spaces for alignment. Unlike a file with a consistent delimiter, ls -l
doesn't have a single character that separates the fields.
To extract just the first 10 characters of each line:
# Get the first 10 characters of each line
cut -c 1-10 students.csv
Output:
ID,First N
101,John,D
102,Jane,S
103,Mike,J
To exclude the GPA column using complement:
# Get all information except GPA
cut -d ',' -f 1-4 students.csv
Output:
ID,First Name,Last Name,Major
101,John,Doe,Computer Science
102,Jane,Smith,Biology
103,Mike,Johnson,Engineering
The paste Command
The paste command functions like a digital assembler, bringing together lines from multiple files side by side. While cut slices data vertically (by column), paste combines data horizontally, merging corresponding lines from different files with a delimiter between them. This command excels at creating tabular data from separate sources and converting data between row and column formats. It's particularly useful when you need to combine related information stored in separate files without complex processing.
When to Use
- When you need to combine multiple files side by side
- When building CSV files by combining columns from different sources
- When converting data from rows to columns or vice versa
- When you want to create a simple table from separate lists
Common Options
Option | What It Does | When to Use It |
---|---|---|
-d LIST |
Specifies delimiters to use between merged lines | When you need a specific separator between columns |
-s |
Merges lines from one file at a time (serial) | When converting rows to columns within a single file |
Practical Examples
Imagine you have two separate files with related information:
names.txt
:
John
Jane
Mike
scores.txt
:
85
92
78
To combine names and scores side by side with a tab delimiter (default):
# Combine names and scores with tab separator
paste names.txt scores.txt
Output:
John 85
Jane 92
Mike 78
To create a CSV file by combining the files with a comma:
# Create a CSV from the two files
paste -d ',' names.txt scores.txt
Output:
John,85
Jane,92
Mike,78
Working with a single file to convert rows to columns:
shopping.txt
:
Apples
Bananas
Milk
Bread
Eggs
Cheese
To display items in two columns:
# Convert shopping list to two columns
paste - - < shopping.txt
Output:
Apples Bananas
Milk Bread
Eggs Cheese
Or using the serial option:
# Using serial paste for the same effect
paste -s -d '\t\n' shopping.txt
The join Command
The join command works much like a database JOIN operation, intelligently combining lines from two files based on a common field or "key". Unlike paste, which blindly combines lines by position, join matches lines that share the same value in a specified field. This makes it ideal for relating information across multiple data sources, similar to how you might combine tables in a database. Join requires sorted input files and provides various options to handle different join types (inner, outer) and field specifications.
When to Use
- When combining data from two files based on a common identifier (like a database join)
- When merging configuration files based on matching keys
- When you need to relate information from separate sources
- When performing data analysis across multiple data files
Common Options
Option | What It Does | When to Use It |
---|---|---|
-1 FIELD |
Specifies the join field from the first file | When the common key is not in the first column |
-2 FIELD |
Specifies the join field from the second file | When the common key is not in the first column |
-t CHAR |
Sets the field separator character | When fields are separated by something other than whitespace |
-a FILENUM |
Prints unpairable lines from specified file (1 or 2) | When you need an outer join (keep unmatched records) |
-o FORMAT |
Specifies the output format | When you need to control exactly which fields appear in output |
Practical Examples
Imagine you have two files with related student information:
students.txt
(sorted by ID):
101 John Doe
102 Jane Smith
103 Mike Johnson
grades.txt
(sorted by ID):
101 A Biology
102 B Chemistry
103 A Computer_Science
To join the files based on the student ID (first field):
# Join student info with their grades
join students.txt grades.txt
Output:
101 John Doe A Biology
102 Jane Smith B Chemistry
103 Mike Johnson A Computer_Science
Now imagine you have files with different separators:
students_csv.txt
:
101,John,Doe
102,Jane,Smith
103,Mike,Johnson
grades_csv.txt
:
101,A,Biology
102,B,Chemistry
103,A,Computer_Science
To join CSV files (using comma as separator):
# Join CSV files
join -t ',' students_csv.txt grades_csv.txt
Output:
101,John,Doe,A,Biology
102,Jane,Smith,B,Chemistry
103,Mike,Johnson,A,Computer_Science
To perform an outer join (keep records from file 1 even if they don't match):
# Outer join to keep all student records
join -a 1 students.txt grades.txt
Tips for Success
- When using
join
, both files must be sorted on the join field - Use
cut
with pipes to extract specific fields from command output - For
paste
, ensure files have matching line counts or use it with caution - Create temporary header files to make your data clearer when working with columns
- Test complex commands with smaller data samples before running on large files
Common Mistakes to Avoid
- Using
join
on unsorted files (results will be incorrect) - Forgetting to specify the delimiter with
-d
forcut
andpaste
- Using character positions (
-c
) when fields would be more appropriate - Overwriting original files without making backups
- Confusing
paste
andjoin
(paste is side-by-side, join is by matching fields)
Best Practices
- Use
head
to preview the structure of your data files before processing - Pipe output to
less
or redirect to a new file rather than overwriting originals - Add comments to your scripts explaining complex text processing operations
- Combine these commands with
sort
,uniq
, andgrep
for powerful data processing