CIS120 Linux Fundamentals by Scott Shaper

Linux Archiving and Zipping

Think of Linux archiving and compression tools like packing for different kinds of trips. The tar command is like a suitcase that lets you pack multiple items together for easy transport. The gzip command is like a vacuum-seal bag that makes individual items smaller, while zip is like a space-saving travel bag that both holds multiple items and compresses them. Just as you'd choose different packing methods depending on your travel needs, you'll select different archiving and compression tools based on what you're trying to accomplish.

The course setup.sh script creates ~/playground/chapter7/archiving/ with sample files and directories so you can run the examples below as written. It includes essay.docx, research.pdf, notes.txt for tar practice; src/, docs/, and README.md for a project archive; logs/ for gzip; and report.docx, data.csv, images.jpg for zip. Run cd ~/playground/chapter7/archiving before trying the examples.

Quick Reference

Command What It Does Common Use
tar Combines multiple files into a single archive Bundling related files together for backup or transfer
gzip Compresses single files to reduce size Making individual files smaller to save space
zip/unzip Creates/extracts compressed archives Sharing files with users of different operating systems
tar -z Creates compressed tar archives (tar+gzip) Efficiently packaging and compressing multiple files

When to Use These Commands

The tar Command

Think of tar as a cardboard box that lets you pack multiple items together. The name tar stands for "tape archive" (from its original use with tape backups), but today it's used for bundling files together into a single container file called a "tarball." Importantly, tar by itself doesn't save any space – it just combines files together.

When you create a tar archive, the original files remain unchanged. It's like taking photos of your items and putting them in a box while leaving the originals where they are. To actually reduce the size, you'll typically combine tar with a compression tool like gzip.

Option What It Does When to Use
-c Creates a new archive When you want to bundle files together
-x Extracts files from an archive When you want to unpack an archive
-t Lists the contents of an archive When you want to see what's inside without extracting
-v Shows detailed output (verbose) When you want to see which files are being processed
-f Specifies the archive filename Always needed to name the archive file (almost always used)
-z Compresses with gzip When you want a smaller compressed archive (.tar.gz)
-j Compresses with bzip2 When you need even smaller files (but slower compression)
-C Changes to directory before performing operations When extracting files to a specific location

Practical Examples

# Use the course practice directory (run: cd ~/playground/chapter7/archiving first)
cd ~/playground/chapter7/archiving

# Create a tar archive of multiple files
tar -cvf homework.tar essay.docx research.pdf notes.txt
# This creates homework.tar containing all three files

# Create a compressed tar archive (using gzip)
tar -zcvf homework.tar.gz essay.docx research.pdf notes.txt
# This creates a compressed archive homework.tar.gz

# List contents of a tar archive without extracting
tar -tvf homework.tar
# Shows all files inside the archive with details

# Extract all files from a tar archive
tar -xvf homework.tar
# Extracts all files to current directory

# Extract a compressed tar.gz archive
tar -xzvf homework.tar.gz
# Extracts and decompresses in one step

# Extract to a different directory
tar -xvf homework.tar -C ~/backup/
# Extracts files to ~/backup/ directory (create ~/backup first if needed)

The gzip Command

Think of gzip like a vacuum-seal storage bag for individual items. It compresses single files to make them smaller, replacing the original file with a compressed version that has a .gz extension. Unlike tar, which bundles files together, gzip works on one file at a time.

When you gzip a file, it's like vacuum-sealing a single sweater to make it take up less space in your drawer. The compressed file contains the same data but uses less disk space. To use the original file again, you need to decompress it first.

Option What It Does When to Use
-d Decompresses files When you need to restore a compressed file
-c Writes to standard output, keeping original file When you want to compress a file but keep the original, typically redirecting output to a new file
-l Lists compression information When you want to see compression statistics
-r Recursively processes directories When compressing many files in directories
-v Shows verbose output When you want to see compression details
-1 to -9 Sets compression level When balancing between speed (-1) and size (-9)

Practical Examples

# From ~/playground/chapter7/archiving (course setup provides logs/ and other files)

# Compress a single file (original is replaced)
gzip large_file.txt
# Creates large_file.txt.gz and removes the original

# Compress but keep the original file
gzip -c notes.txt > notes_backup.txt.gz
# Creates notes_backup.txt.gz and keeps notes.txt

# Decompress a file
gzip -d large_file.txt.gz
# Restores the original large_file.txt

# See compression statistics
gzip -l *.gz
# Shows how much space was saved for each file

# Compress with maximum compression (use large_file.txt from course setup)
gzip -9 large_file.txt
# Takes longer but creates smallest possible file

# Compress all files in a directory (individually) — use the course logs/ directory
gzip -r logs/
# Compresses each file in the logs directory

The zip and unzip Commands

Think of zip as a specialized travel compression bag that both bundles multiple items together and compresses them at the same time. The zip format is especially useful because it's compatible with virtually all operating systems, making it perfect for sharing files with people using Windows or macOS.

When you create a zip archive, it's like packing a space-saving travel bag for a trip – you can include multiple items, they take up less space, and anyone can open it regardless of what luggage they use. The unzip command is used to extract files from zip archives.

Option What It Does When to Use
-r Recursively includes directories When zipping folders with all their contents
-u Updates existing entries When adding newer versions of files to archives
-d Deletes entries from archive When removing specific files from a zip archive
-v Shows verbose output When you want to see what's being processed
-e Encrypts the archive When creating password-protected archives
-l Lists contents (with unzip) When checking what's in an archive

Practical Examples

# From ~/playground/chapter7/archiving (course setup provides report.docx, data.csv, images.jpg)

# Create a zip archive with multiple files
zip assignment.zip report.docx data.csv images.jpg
# Creates assignment.zip containing all files

# Create a zip archive including all files in a directory
zip -r website.zip ./src/
# Recursively adds all files and subdirectories in src/

# Add a file to an existing zip archive
zip assignment.zip notes.txt
# Adds notes.txt to the existing archive

# Extract all files from a zip archive
unzip assignment.zip
# Extracts all files to current directory

# List the contents of a zip file without extracting
unzip -l assignment.zip
# Shows all files in the archive

# Extract to a specific directory
unzip assignment.zip -d ~/extracted/
# Extracts files to ~/extracted/ directory

# Create an encrypted zip archive with password
zip -e secret.zip notes.txt
# Will prompt for a password

Combining tar and gzip

We have already seen examples of this in the tar section of this lesson. Think of using tar with gzip like packing a suitcase and then using a vacuum to remove excess air. First, tar bundles multiple files together, then gzip compresses that bundle to save space. This is so common that tar has built-in options to handle it automatically.

When you create a .tar.gz file (also called a "tarball"), you're efficiently packaging files for storage or transfer. It's the most common archiving method in Linux because it maintains file permissions and structures while reducing size.

Common Tar+Gzip Patterns

# From ~/playground/chapter7/archiving

# Create a compressed archive (tar+gzip)
tar -czvf backup.tar.gz docs/ src/ README.md
# The z flag tells tar to use gzip compression

# Extract a compressed archive
tar -xzvf backup.tar.gz
# Extracts and decompresses in one step

# Examine contents without extracting
tar -tzvf backup.tar.gz
# Lists all files in the compressed archive

Common Archive Extensions

Extension What It Is How to Create How to Extract
.tar Uncompressed tar archive tar -cvf archive.tar files tar -xvf archive.tar
.tar.gz or .tgz Compressed tar using gzip tar -czvf archive.tar.gz files tar -xzvf archive.tar.gz
.tar.bz2 or .tbz Compressed tar using bzip2 tar -cjvf archive.tar.bz2 files tar -xjvf archive.tar.bz2
.gz Single file compressed with gzip gzip file gzip -d file.gz
.zip Zip archive (cross-platform) zip -r archive.zip files unzip archive.zip

Picking the Right Tool

Decision Guide

# When to use tar (without compression):
- For bundling files while preserving permissions
- When compression isn't needed
- For creating backups of system files

# When to use tar + gzip (tar.gz):
- For most Linux archiving and compression needs
- When sending files to other Linux/Unix users
- When both bundling and compression are needed

# When to use zip:
- When sharing files with Windows or Mac users
- When you need built-in encryption
- When you need to add/update specific files in archives

# When to use gzip alone:
- For compressing single large files (like logs)
- When you don't need to bundle multiple files
- For files that will be processed by other tools

Tips for Success

Common Mistakes to Avoid

Best Practices