I had a fascination with file compression in the 2000s, but OpenZFS’s inline compression has spoiled me ever since. When I need to leave its comforting confides though, I turn to the excellent lzop and plzip tools, which have largely replaced gzip and bzip2 for my one off archives.
Markus F.X.J. Oberhumer’s lzop is my goto when I need fast (de)compression. It implements the LZO algorithm, similar but distinct from my beloved lz4 on OpenZFS. It has no business giving such great results in such a short time.
To illustrate, here’s a disk image I exported from a retired Xen hypervisor onto my FreeBSD tower. I like this because it has a nice mix of textual and binary data:
$ ls -l disk.qcow2 ==> 17381195776
This is about 16 GiB. Let’s use lzop with its default
-3 compression level:
$ time lzop -v disk.qcow2 ==> compressing disk.qcow2 into disk.qcow2.lzo ==> 0m41.73s real 0m29.72s user 0m09.77s system $ ls -l image.raw.lzo ==> 12246333887
For our purposes here, we’ll go by the rough “wall clock” time using
real, which is about 40 seconds. That’s wickedly fast to get the file down to about 11 GiB! This is also what makes it perfect for piping a
dd block copy over
ssh, because I know I’ll saturate my network connection long before the CPU on either end.
Decompression is similarly impressive:
$ time lzop -x disk.qcow2.lzo ==> 0m35.54s real 0m25.89s user 0m08.11s system
On the other side we have plzip, a multi-threaded implementation of lzip by Antonio Diaz Diaz. I use this when compression ratio is paramount, such as for long-term archiving or when trying to fit on a specific-sized disk.
Here it is working on the same image, using the default
-6 compression level:
$ time plzip disk.qcow2 ==> 17m12.81s real 170m04.58s user 1m17.09s system $ ls -l disk.qcow2.lz ==> 8470705508
It got the disk down to 7.9 GiB, more than 3.0 GiB more than lzop was able to! But it took 17 minutes of
real time, and more than 170 cumulative
user minutes across my CPU cores.
Decompression is a different story, with the original file being returned in about 3 minutes. This makes it a good candidate for distributing compressed files:
$ time plzip -d disk.qcow2.lz ==> 3m02.95s real 22m47.74s user 1m13.96s system
I love running silly, entirely unscientific tests like this, but they also serve to illustrate a point. People starting in this industry will often choose the “best” solution based on one specific metric, but often times it comes down to what priority you have for a given task. Your choice of compression can have a huge impact on a given solution, and optimising for one metric over another may work well, or could bite you in the posterior.
Strategically deploying the right tool for the job is as much an art as a science, and is one of the things I enjoy the most about my job.