CIS120 Linux Fundamentals by Scott Shaper

Linux Archiving and Zipping

Think of Linux archiving and compression tools like packing for different kinds of trips. The tar command is like a suitcase that lets you pack multiple items together for easy transport. The gzip command is like a vacuum-seal bag that makes individual items smaller, while zip is like a space-saving travel bag that both holds multiple items and compresses them. Just as you'd choose different packing methods depending on your travel needs, you'll select different archiving and compression tools based on what you're trying to accomplish.

Quick Reference

Command What It Does Common Use
tar Combines multiple files into a single archive Bundling related files together for backup or transfer
gzip Compresses single files to reduce size Making individual files smaller to save space
zip/unzip Creates/extracts compressed archives Sharing files with users of different operating systems
tar -z Creates compressed tar archives (tar+gzip) Efficiently packaging and compressing multiple files

When to Use These Commands

The tar Command

Think of tar as a cardboard box that lets you pack multiple items together. The name tar stands for "tape archive" (from its original use with tape backups), but today it's used for bundling files together into a single container file called a "tarball." Importantly, tar by itself doesn't save any space – it just combines files together.

When you create a tar archive, the original files remain unchanged. It's like taking photos of your items and putting them in a box while leaving the originals where they are. To actually reduce the size, you'll typically combine tar with a compression tool like gzip.

Option What It Does When to Use
-c Creates a new archive When you want to bundle files together
-x Extracts files from an archive When you want to unpack an archive
-t Lists the contents of an archive When you want to see what's inside without extracting
-v Shows detailed output (verbose) When you want to see which files are being processed
-f Specifies the archive filename Always needed to name the archive file (almost always used)
-z Compresses with gzip When you want a smaller compressed archive (.tar.gz)
-j Compresses with bzip2 When you need even smaller files (but slower compression)
-C Changes to directory before performing operations When extracting files to a specific location

Practical Examples

# Create a tar archive of multiple files
tar -cvf homework.tar essay.docx research.pdf notes.txt
# This creates homework.tar containing all three files

# Create a compressed tar archive (using gzip)
tar -czvf project.tar.gz src/ docs/ README.md
# This creates a compressed archive project.tar.gz

# List contents of a tar archive without extracting
tar -tvf homework.tar
# Shows all files inside the archive with details

# Extract all files from a tar archive
tar -xvf homework.tar
# Extracts all files to current directory

# Extract a compressed tar.gz archive
tar -xzvf project.tar.gz
# Extracts and decompresses in one step

# Extract to a different directory
tar -xvf homework.tar -C ~/backup/
# Extracts files to ~/backup/ directory

The gzip Command

Think of gzip like a vacuum-seal storage bag for individual items. It compresses single files to make them smaller, replacing the original file with a compressed version that has a .gz extension. Unlike tar, which bundles files together, gzip works on one file at a time.

When you gzip a file, it's like vacuum-sealing a single sweater to make it take up less space in your drawer. The compressed file contains the same data but uses less disk space. To use the original file again, you need to decompress it first.

Option What It Does When to Use
-d Decompresses files When you need to restore a compressed file
-k Keeps original file When you want both compressed and original versions
-l Lists compression information When you want to see compression statistics
-r Recursively processes directories When compressing many files in directories
-v Shows verbose output When you want to see compression details
-1 to -9 Sets compression level When balancing between speed (-1) and size (-9)

Practical Examples

# Compress a single file (original is replaced)
gzip large_file.txt
# Creates large_file.txt.gz and removes the original

# Compress but keep the original file
gzip -k important_data.csv
# Creates important_data.csv.gz and keeps important_data.csv

# Decompress a file
gzip -d large_file.txt.gz
# Restores the original large_file.txt

# See compression statistics
gzip -l *.gz
# Shows how much space was saved for each file

# Compress with maximum compression
gzip -9 huge_logfile.log
# Takes longer but creates smallest possible file

# Compress all files in a directory (individually)
gzip -r ./logs/
# Compresses each file in the logs directory

The zip and unzip Commands

Think of zip as a specialized travel compression bag that both bundles multiple items together and compresses them at the same time. The zip format is especially useful because it's compatible with virtually all operating systems, making it perfect for sharing files with people using Windows or macOS.

When you create a zip archive, it's like packing a space-saving travel bag for a trip – you can include multiple items, they take up less space, and anyone can open it regardless of what luggage they use. The unzip command is used to extract files from zip archives.

Option What It Does When to Use
-r Recursively includes directories When zipping folders with all their contents
-u Updates existing entries When adding newer versions of files to archives
-d Deletes entries from archive When removing specific files from a zip archive
-v Shows verbose output When you want to see what's being processed
-e Encrypts the archive When creating password-protected archives
-l Lists contents (with unzip) When checking what's in an archive

Practical Examples

# Create a zip archive with multiple files
zip assignment.zip report.docx data.csv images.jpg
# Creates assignment.zip containing all files

# Create a zip archive including all files in a directory
zip -r website.zip ./mywebsite/
# Recursively adds all files and subdirectories

# Add a file to an existing zip archive
zip assignment.zip bibliography.txt
# Adds bibliography.txt to the existing archive

# Extract all files from a zip archive
unzip project.zip
# Extracts all files to current directory

# List the contents of a zip file without extracting
unzip -l vacation.zip
# Shows all files in the archive

# Extract to a specific directory
unzip project.zip -d ~/extracted/
# Extracts files to ~/extracted/ directory

# Create an encrypted zip archive with password
zip -e secret.zip confidential.pdf
# Will prompt for a password

Combining tar and gzip

Think of using tar with gzip like packing a suitcase and then using a vacuum to remove excess air. First, tar bundles multiple files together, then gzip compresses that bundle to save space. This is so common that tar has built-in options to handle it automatically.

When you create a .tar.gz file (also called a "tarball"), you're efficiently packaging files for storage or transfer. It's the most common archiving method in Linux because it maintains file permissions and structures while reducing size.

Common Tar+Gzip Patterns

# Create a compressed archive (tar+gzip)
tar -czvf backup.tar.gz ~/documents/
# The z flag tells tar to use gzip compression

# Extract a compressed archive
tar -xzvf backup.tar.gz
# Extracts and decompresses in one step

# Examine contents without extracting
tar -tzvf backup.tar.gz
# Lists all files in the compressed archive

Common Archive Extensions

Extension What It Is How to Create How to Extract
.tar Uncompressed tar archive tar -cvf archive.tar files tar -xvf archive.tar
.tar.gz or .tgz Compressed tar using gzip tar -czvf archive.tar.gz files tar -xzvf archive.tar.gz
.tar.bz2 or .tbz Compressed tar using bzip2 tar -cjvf archive.tar.bz2 files tar -xjvf archive.tar.bz2
.gz Single file compressed with gzip gzip file gzip -d file.gz
.zip Zip archive (cross-platform) zip -r archive.zip files unzip archive.zip

Tips for Success

Common Mistakes to Avoid

Best Practices

Picking the Right Tool

Decision Guide

# When to use tar (without compression):
- For bundling files while preserving permissions
- When compression isn't needed
- For creating backups of system files

# When to use tar + gzip (tar.gz):
- For most Linux archiving and compression needs
- When sending files to other Linux/Unix users
- When both bundling and compression are needed

# When to use zip:
- When sharing files with Windows or Mac users
- When you need built-in encryption
- When you need to add/update specific files in archives

# When to use gzip alone:
- For compressing single large files (like logs)
- When you don't need to bundle multiple files
- For files that will be processed by other tools