Rzip is absolutely incredible

mikuru.jpg
Mikuru tried to compress my files too using her superpower energy. Rzip still worked better.

After reading an old post on Jeremy Zawodny's weblog and installing it myself, I have to say Rzip is my new favourite compression algorithm!

From the developer's website:

rzip is a compression program, similar in functionality to gzip or bzip2, but able to take advantage long distance redundancies in files, which can sometimes allow rzip to produce much better compression ratios than other programs. The original idea behind rzip is described in my PhD thesis.

For a bit of real world testing, I decided to try compressing the www folder in my home directory on my MacBook Pro. I thought this folder would be a useful test because it's relatively large and contains a few large files mixed in with hundreds of smaller ones. From what I understand of compression algorithms, they each tend to favour compressing certain types of files and in certain quantities so I figured this way it would show a more balanced result.

The original folder size was 436.0 MiB with 312 files. The Tape Archive is the control because it's needed for all but ZIP to archive the files before they can be compressed. For convenience the names also redirect to their associated Wikipedia pages.

Algorithm Extension File size % of original % saved
Tape Archive www.tar 423.9 MiB - -
ZIP www.tar.zip 290.9 MiB 68.62 31.38
Bzip2 www.tar.bz2 286.3 MiB 67.72 32.28
GNU zip www.tar.gz 284.8 MiB 67.54 32.46
Rzip www.tar.rz 104.7 MiB 24.70 75.30

What's curious is that Gzip was more efficient than Bzip2, in almost every other circumstance I've come across the reverse was true. I'm not sure how much that affected the results of the other formats. The final result is clear though, Rzip was able to squash like nobody else!

steamroller.jpg
Image © Jan Mehlich, from Wikimedia Commons. As with the image above, I thought it was mildly amusing given the subject matter. I hate dry weblog posts without pictures you see.

From what I can make out reading the developer's website; and with help from dadaist in real-time on Twitter; is that Rzip isn't an entirely new compression algorithm per-se, it essentially just uses larger chunks of data over much longer distances, and then uses existing algorithms to process it all.

I theorise from reading up on this that only in the last decade have computers had enough processing power, and more importantly memory, to be able to pull this off. 900MiB of looking space is great for compression, but can suck up all your resources pretty fast if you don't have much. This is why we haven't seen this level of compression until recently.

In any case, I know what I'll be using to compress all my large files and folders with now :).


Feedback and support

For comments and questions, tweet out to @Rubenerd_Blog, or email me at weblog at ruben schade dot com. Let me know if you don't want your comment posted. I also accept the following donations or referrals if you found something useful or fun. Thank you :).