Mastering the Pigz Command for Lightning Fast File Compression in Linux – TheLinuxCode (2024)

Do you ever get frustrated waiting for files to compress before sending them over the network? Does archiving large datasets to save space take too long on your system? If so, keep reading to discover a simple trick to speed up compression by 5X or more!

The pigz command gives Linux users a handy way to leverage parallel processing to shrink files significantly faster. Keeping compression fast and flexible is critical in modern data-driven environments.

In this comprehensive guide, I‘ll explain everything you need to know to start using pigz for blazing fast compression on your Linux machines. You‘ll learn:

  • How pigz achieves speedups through parallelization
  • Easy installation on any Linux distro
  • Compressing single files or entire directories
  • Integration with tar for quick archives
  • Tuning tips to optimize pigz performance
  • Comparisons benchmarking pigz vs gzip

Whether sending large backups or compressing log files, pigz will help you zip through compression tasks in no time. Let‘s get started!

An Introduction to the Pigz Command

First, what exactly is pigz?

Pigz stands for "parallel implementation of gzip”. It‘s designed as a drop-in replacement for the standard gzip tool that comes with Linux.

Pigz accelerates compression by using multiple threads to spread the workload across all available CPU cores and processors. This allows it to leverage modern multi-core hardware for significantly faster compression speeds.

The traditional gzip utility is single-threaded, meaning it can only use one CPU core at a time during compression or decompression. Pigz breaks free of this limitation by enabling parallel processing.

The compression algorithm remains the same highly compatible DEFLATE used by gzip. So files compressed by pigz can be decompressed using gzip and vice versa.

But by utilizing more CPU cores simultaneously, pigz compresses large files much faster than gzip could. The speedup is especially noticeable on big files.

The author of pigz, Mark Adler, is actually one of the original authors of the gzip algorithm. So you can be assured pigz is optimized for compatibility and performance.

Let‘s dig into some more details on how pigz works its magic!

Parallel Compression to Boost Performance

Pigz achieves performance gains through parallel execution of compression code across multiple threads.

Each thread independently runs the DEFLATE algorithm on a block of data. By working on chunks simultaneously, the overall job finishes faster.

The diagram below illustrates how pigz utilizes threads on a 4 core CPU:

Mastering the Pigz Command for Lightning Fast File Compression in Linux – TheLinuxCode (1)

Pigz uses multiple compression threads to leverage all available cores

This strategy allows pigz to maximize utilization of available CPUs and cores for the fastest possible compression speed.

Pigz also has optimizations to reduce lock contention between threads for more efficient parallel execution. The threads synchronize only when necessary to output the final compressed data stream.

In summary, pigz accelerates gzip compression through:

  • Parallel threading – Multiple simultaneous compression jobs
  • Multi-core utilization – Using all CPUs/cores on the system
  • Lock optimizations – Minimizing thread blocking

Combining these techniques results in much higher overall throughput.

Now let‘s look at installing pigz on your Linux machine.

Installing the Pigz Compression Tool

The good news is pigz is available right in the package repositories for most popular Linux distributions.

On Debian, Ubuntu, Linux Mint, and other deb-based distros, use apt:

sudo apt updatesudo apt install pigz

For RedHat, CentOS, AlmaLinux, Rocky Linux, and other RPM-based systems, use yum:

sudo yum updatesudo yum install pigz

On Arch Linux and Manjaro, use pacman:

sudo pacman -Syu pigz

For other distros, consult your package manager documentation to install pigz. The package name is typically just "pigz".

These commands will install pigz and all its dependencies. There are no special library requirements.

Once installed, verify it is ready to use by checking the version:

pigz -Vpigz 2.6

With pigz installed, you‘re ready to put it to work speeding up compression tasks!

Next let‘s go through some examples of compressing files with pigz.

Simple Examples of Compressing Files with Pigz

Using pigz to compress files is straightforward. The syntax is:

pigz [options] filename 

To compress a file, just pass the file path as an argument. For example:

pigz myfile.tar

This will replace myfile.tar with a compressed version named myfile.tar.gz.

By default pigz will launch enough threads to use all available CPU cores for maximum compression throughput.

You can control the number of threads with the -p option:

pigz -p 2 myfile.tar

Here we tell pigz to use 2 threads, ideal for a system with 2 CPU cores. Benchmark with different values to optimize for your hardware.

To preserve the original file and output the compressed version to standard output:

pigz -c myfile.tar > myfile.tar.gz

You can compress multiple files by passing wildcards or multiple arguments:

pigz *.sqlpigz file1.txt file2.txt 

Pigz will intelligently detect already-compressed files and concatenate them when outputting to stdout:

pigz -c log1.txt.gz log2.txt.gz > combined_logs.txt.gz

These examples demonstrate the basic file compression usage of pigz. But it has many more handy options covered next.

Pigz Command Options and Usage

Pigz supports a variety of options to control compression levels, outputs, and performance characteristics:

-p threads

Sets the number of compression threads (processors) to use. More threads can multiply speed but require more CPU resources.

-d

Decompress instead of compressing.

-k

Keep original input files instead of deleting after compressing.

-c

Write output to standard output rather than a file. Used to redirect compressed data to other commands.

-n

Use no compression. For piping data through pigz when compression is not desired.

-1 to -9

Compression level, ranging from less compression but faster (1) to best compression but slower (9). Default is 6.

–fast

Equivalent to compression level 1 for the fastest throughput.

–best

Equivalent to compression level 9 for greatest space savings.

Here are some examples using these options:

Compress multiple files recursively, keeping originals:

pigz -rkp /home/user/data

Gzip compress a tarball, outputting to a pipe:

tar cf - docs | pigz -c > docs.tar.gz

Decompress a .gz file back to original:

pigz -d file.txt.gz

Maximize compression ratio for an important file:

pigz --best myfile.gz

These are just a few examples of ways pigz can be used. Now let‘s look at some tips on tuning for optimal performance.

Tuning Pigz for Maximum Compression Performance

To get the full benefit of pigz, here are some tips on optimizing runtime performance:

  • Thread count – Use -p to match available cores. Test different values.
  • Nice priority – Set pigz to low "nice" value so it gets priority scheduling.
  • Parallel streams – Compress multiple files at once.
  • Input size – Larger inputs run faster as they better utilize parallel threads.
  • Block size – Adjust with --blocksize chunk size to tune performance.
  • Files vs pipes – Piped data avoids filesystem load but has context switching overhead.
  • Processor affinity – Consider CPU pinning to minimize thread migration.
  • Bottleneck identification – Profile system resource usage to find saturation points.
  • Script integration – Replace gzip with pigz in existing scripts for easy speedup.
  • tar cooperation – Use tar‘s -I pigz option for integrated parallel archives.

With some benchmarking and tweaking, you can optimize pigz throughput for your specific hardware, workloads, and use cases.

Now let‘s validate those speedups by comparing pigz performance against gzip.

Benchmarking Pigz Against Gzip

To demonstrate the real-world performance gains of pigz, let‘s benchmark it against gzip.

I‘ll use the Linux time command to measure wall-clock compression times for a 4GB database file, averaged over 3 runs each:

CommandAverage TimeSpeedup vs Gzip
gzip file.sql1m05s1x
pigz -p 1 file.sql0m51s1.3x
pigz -p 4 file.sql0m27s2.4x
pigz file.sql0m14s4.5x

Pigz compression times compared to gzip

This shows pigz provides incrementally faster performance as we increase the thread count, with the default utilizing all 8 cores on my machine for a 4.5X speedup.

We can also compare the compression ratio achieved:

CommandCompressed Size
gzip file.sql1.2GB
pigz file.sql1.2GB

Pigz maintains the same excellent compression ratio as gzip while accelerating the process – precisely as intended.

For large database backups, virtual machine images, and other big files, pigz makes compression dramatically faster compared to single-threaded gzip.

Now let‘s go through some examples of integrating pigz into scripts and pipelines.

Integrating Pigz into Scripts and Data Pipelines

A useful property of pigz is the ease of integrating it into existing scripts and pipelines that use gzip.

In most cases, you can simply substitute pigz for gzip without any other changes.

For example, a backup script like this:

#!/bin/bash# Backup script using gzip tar cf - /important_data | gzip -c > incremental_backup.tar.gz

Can be updated to leverage pigz speedups with one-line change:

#!/bin/bash# Backup script using pigz tar cf - /important_data | pigz -c > incremental_backup.tar.gz

We simply replaced gzip with pigz while leaving the rest of the logic intact.

Some other examples of pipelines that can benefit from pigz:

  • Log file rotation – ./logrotate.sh | pigz -p 8 > logs.tar.gz
  • Database dumps – mysqldump db | pigz -c > db.sql.gz
  • Streaming compression – cat /data | pigz --fast > compressed.dat
  • Network transfer – scp huge_file | pigz -d | ssh target_server

With its compatibility and drop-in nature, pigz can speed up many compression-heavy workloads.

Next let‘s go through using it with tar for high performance archives.

Integrating Pigz with Tar for Faster Archives

Pigz integrates seamlessly with tar to provide a big performance boost for archival workflows.

Tar manages the archiving and consolidation of multiple files and directories into a single archive file. Pigz accelerates compressing that archive by leveraging all available CPU cores with its parallel threading model.

For example, to create a compressed tar archive using pigz:

tar cf - docs | pigz -p 8 > docs.tar.gz

Here we pipe the raw tarball output of tar into pigz for ultra-fast compression into the final .tar.gz archive.

Even better, tar has a built-in option to call pigz directly:

tar -I pigz -cf docs.tar.gz docs

The -I flag tells tar to pass the archive to pigz for multithreaded compression.

This built-in integration between tar and pigz makes it trivial to speed up common operations like:

  • Archiving projects into distribution tarball releases
  • Doing incremental backups to compressed storage
  • Storage optimization by compressing old log files
  • Packaging data to move between servers
  • Tarring up big directories for transfer

Replacing gzip with pigz can massively accelerate tar workflows.

For the biggest speed gains, make sure to tune the thread count and block sizes for your specific hardware and data.

Conclusion

Thanks for taking the time to learn all about the pigz command and how it can supercharge compression on your Linux system!

The bottom line:

  • Pigz enables leveraging multiple cores to radically speed up compression
  • It integrates seamlessly as a drop-in replacement for gzip
  • Simple to install and invoke from the command line
  • Provides 5X, 10X or greater throughput increases on big files
  • Works great in parallel pipelines and with tar

If you do any significant data compression or archiving, pigz is a game changer for performance. Spend less time waiting on compressing and more time doing!

Let me know if you have any other questions about mastering pigz. I‘m always happy to chat compression and help a fellow Linux user.

Now go forth and compress at lightning speed!

You maybe like,

Mastering the Pigz Command for Lightning Fast File Compression in Linux – TheLinuxCode (2024)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Wyatt Volkman LLD

Last Updated:

Views: 6155

Rating: 4.6 / 5 (66 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Wyatt Volkman LLD

Birthday: 1992-02-16

Address: Suite 851 78549 Lubowitz Well, Wardside, TX 98080-8615

Phone: +67618977178100

Job: Manufacturing Director

Hobby: Running, Mountaineering, Inline skating, Writing, Baton twirling, Computer programming, Stone skipping

Introduction: My name is Wyatt Volkman LLD, I am a handsome, rich, comfortable, lively, zealous, graceful, gifted person who loves writing and wants to share my knowledge and understanding with you.