In any Linux environment, it's essential to be able to archive and compress files. The tar (short for tape archive) and gzip
(short for GNU zip) commands are the most common utilities that package and reduce file sizes on Unix-like systems.
While tar
collects multiple files and directories into a single archive, gzip
compresses data to minimize storage usage and speed up transfers. These tools are often combined to create .tar.gz archives, which both bundle and compress files efficiently.
This tutorial will explain the differences between the tar
and gzip
commands, show how they work, and demonstrate practical examples for each.
tar and gzip Commands: Overview
The tar
and gzip
commands serve distinct roles in Linux file management. The following table explains their core functions, syntax, and other relevant features:
Feature | tar | gzip |
---|---|---|
Purpose | Archives multiple files and directories into a single file without compression. | Compresses data in a single file to reduce its size. |
Command type | Archiving utility. | Compression utility. |
Syntax | tar -cf archive.tar [files] <br>tar -xf archive.tar | gzip [options] file gzip -d file.gz |
Options | -c , -x , -t , -f , -v , -z | -d , -k , -v , -l |
Input and output | Accepts multiple files or directories and creates a single archive. Compression depends on the option used. | Accepts one file and outputs a compressed file. |
Default extension | .tar , .tar.gz , .tgz | .gz |
Metadata preservation | Yes, retains permissions, ownership, timestamps, and directory structure. | Compresses mostly only the file content, but retains some metadata. |
Decompression command | tar -xzf archive.tar.gz | gzip -d file.gz gunzip file.gz |
Typical use cases | Archiving backups, grouping source files, and packaging directories for transfer. | Compressing large log files or configuration exports. |
Recursive operation | Yes, processes entire directories. | No, works on individual files. |
Availability | Preinstalled on most Linux distributions. | Preinstalled on most Linux distributions. |
tar and gzip: In-Depth Comparison
While the tar
and gzip
commands are frequently used together, they perform different functions and have significant operational differences. Each tool plays a distinct role in archiving and compression workflows, and the right tool or combination affects file size, metadata preservation, and extraction convenience.
To demonstrate those differences, first take the following steps to create a sample directory and files for examples and tests:
1. Use mkdir to create a directory named test_files:
mkdir -p test_files
The command has no output.
2. Navigate to the directory with cd:
cd test_files
The command switches the working directory to test_files, so subsequent commands operate from that location.
3. Run touch to create an empty file:
touch file1.txt
The command has no output, but creates a blank text file named file1.txt.
4. Create a second file, but this time with content, with echo:
echo -e "Sample text 2\nMore text" > file2.txt
The command comes with no output. However, it writes lines Sample text 2
and More text
into file2.txt.
5. Generate a small binary test file for compression comparison:
dd if=/dev/urandom of=binary.bin bs=1K count=4
The binary.bin file contains 4 KiB of random data, useful for testing how well gzip
compresses non-text content.
6. Return to the parent directory:
cd ..
This places you in the directory that contains test_files.
7, List the sample directory contents with ls:
ls -l test_files
The output displays file names, sizes, and permissions so you can confirm everything was created successfully.
The sample files include empty, text, and binary data to show how tar
and gzip
handle different file types:
- Empty files (file1.txt). Stored in the archive, but compression with
gzip
returns almost no size reduction. - Small text files (file2.txt). Text compresses well with
gzip
, but for very small files, the effect is minimal. - Binary files (binary.bin). Compression depends on the binary content. Some binary data compresses poorly, while some compresses well.
Once the sample directory and files are set, go to the following sections for an in-depth comparison of the tar
and gzip
commands.
Function and Workflow
The tar
command collects multiple files and directories into a single archive file without compressing them. It preserves the directory structure, file ownership, permissions, and timestamps to ensure data integrity when you extract the archive. The command simplifies file distribution, backup, and transfer.
The gzip
command compresses a single file with the DEFLATE algorithm. It replaces the original file with a smaller version that has the .gz
extension. The focus of gzip
is to reduce file size and optimize storage or transmission, not to combine files.
While both commands manage file data, their workflows serve different purposes. When used together, they create an archive both structured and compressed, which improves storage efficiency and ease of transfer.
Syntax and Common Operations
The most commonly used tar
and gzip
options allow users to perform the majority of routine archiving and compression tasks. The basic syntax of the commands differs slightly.
The general syntax of tar
is:
tar [options] [archive-name] [files and directories]
The tar
command needs the name of the archive where the files will be stored and the names of the files or directories included in the archive. Options are not mandatory, but allow you to customize the output.
The following table lists the most commonly used tar
options:
Option | Description |
---|---|
-c | Creates a new archive file. |
-x | Extracts files from an archive. |
-t | Lists the contents of an archive. |
-f | Specifies the archive file to use. |
-v | Displays files being processed in verbose mode. |
-z | Compresses the archive with gzip during creation or extraction. |
The main gzip
syntax is:
gzip [options] [file]
The gzip
command requires at least one file argument to compress or decompress. Options are not required, but they provide additional control over command behavior.
The following table lists the most commonly used gzip
options:
Option | Description |
---|---|
-d | Decompresses a file. |
-k | Keeps the original file after compression or decompression. |
-v | Displays the compression ratio and file name in verbose mode. |
-l | Lists information about the compressed file, including uncompressed size and ratio. |
Performance and Compression Efficiency
The tar
command focuses on packaging rather than compression, so its performance depends mainly on file count, size, and disk speed. It runs quickly and uses minimal system resources since it only consolidates files into a single stream without compression.
The gzip
command, on the other hand, performs CPU-intensive compression to minimize file size. The DEFLATE algorithm achieves significant size reduction, but the process takes longer and consumes more processing power.
When combined as tar.gz
, both tools work efficiently together:
tar
. Collects and streams all data into one archive.gzip
. Compresses that single archive to reduce storage and transfer costs.
For example, to compare the performance of both commands, first create an uncompressed archive:
tar -cf archive.tar test_files/
Then compress it with gzip
:
gzip archive.tar
This process produces a compressed file named archive.tar.gz. Verify its creation and compare file sizes with:
ls -lh archive*
The output shows the size difference between the uncompressed archive.tar and the compressed archive.tar.gz file, which confirms gzip
's compression efficiency. The file size is reduced from 10 K to 4.4 K.
Metadata Handling
The tar
and gzip
commands differ in how they preserve file metadata. Understanding these differences helps ensure important attributes like permissions, timestamps, and ownership are retained or handled correctly during archiving and compression.
The tar
command preserves file metadata when it creates an archive. This includes:
- Permissions.
- Ownership.
- Timestamps.
- Directory structure.
When you extract the archive, the files maintain their original attributes, which makes tar
suitable for backups and data transfer with the metadata intact.
The gzip
command preserves basic metadata such as file permissions and timestamps. Ownership may not be retained when decompressing, based on the system and user permissions.
The combination of the commands allows you to preserve full metadata and reduce file size.
For example, to create an archive that preserves metadata, run:
tar -cf archive.tar test_files/
Inspect the archived files and their metadata with:
tar -tvf archive.tar
The output lists the files in the archive along with permissions, ownership, and timestamps.
To compress the archive and keep the original file, run:
gzip -k archive.tar
Verify the output with:
ls -l archive*
The output shows both archive.tar and archive.tar.gz, which confirms the compression succeeded and the metadata inside the archive is intact.
Combining tar and gzip
tar
and gzip
together combine the strengths of both commands. The former packages multiple files and directories into a single archive and preserves metadata. The latter, however, compresses that archive.
This combination is useful to create backups, move large directories, or prepare files for distribution while maintaining the original attributes and structure.
For example, to create a compressed archive of the test_files/ directory, execute:
tar -czf archive.tar.gz test_files/
The command includes:
tar -c
. Creates a new archive.-z
. Compresses the archive withgzip
.-f archive.tar.gz
. Specifies the archive name.
To verify the archive was created and show compression efficiency, run:
ls -lh archive*
Backups and System Snapshots
System snapshots and backups are methods for protecting data against loss or corruption.
Backups represent copies of files and directories stored separately for recovery in case of accidental deletion, hardware failure, or data corruption.
System snapshots capture the entire system state, including files, configurations, and sometimes applications, at a specific time. Snapshots are useful to quickly restore a system to a previous working state.
The tar
and gzip
commands are commonly used to create either because they efficiently package multiple files and directories, preserve metadata, and reduce storage space.
To create a test_files/ directory backup, run:
tar -czf test_files_backup.tar.gz test_files/
This command packages all files and directories inside test_files/ into a single compressed archive named test_files_backup.tar.gz, which preserves permissions, timestamps, and directory structure.
Use ls
to confirm the backup archive exists:
ls -lh test_files_backup.tar.gz
Note: The same workflow applies to system snapshots. The difference is that snapshots cover larger portions of the system or multiple directories, but the commands used remain the same.
Packaging Source Code and Releases
Developers often need to distribute software or share project files and ensure the directory structure and file metadata are preserved. tar
and gzip
offer an efficient way to package source code or releases into a single compressed archive.
If, for example, the files in test_files/ contained source code, all the files would be considered text files. Package them for distribution with:
tar -czf project_release.tar.gz test_files/
This command combines all files and directories in test_files/ into a single archive named project_release.tar.gz.
To verify the archive was created, type:
tar -tzf project_release.tar.gz
Only text files are included in the archive, and the binary file is not present in this scenario.
Compressing Logs and Large Files
System administrators often need to manage log files or large datasets, which consume significant storage space. The tar
and gzip
commands package and compress these files efficiently.
Since logs are usually text files, include only relevant files when creating an archive.
In the test_files/ example, compress just the text logs:
tar -czf text_logs_archive.tar.gz test_files/file1.txt test_files/file2.txt
This command creates a compressed archive named text_logs_archive.tar.gz that contains only the text files.
Using wildcards is also possible:
tar -czf text_logs_archive.tar.gz test_files/*.txt
The wildcard includes all .txt files in test_files/ automatically.
To verify the archive was created, run:
ls -lh text_logs_archive.tar.gz
The output shows the archive and its size, which illustrates the effect of gzip
compression on text files.
Conclusion
This tutorial explained what the tar
and gzip
commands are, their main functions, and features. It also elaborated on their differences by comparing in depth how they work.
Next, learn about other important commands in this Linux Commands Cheat Sheet.