Comparing lzop and plzip compression

Software

I had a fascination with file compression in the 2000s, but OpenZFS’s inline compression has spoiled me ever since. When I need to leave its comforting confides though, I turn to the excellent lzop and plzip tools, which have largely replaced gzip and bzip2 for my one off archives.

Markus F.X.J. Oberhumer’s lzop is my goto when I need fast (de)compression. It implements the LZO algorithm, similar but distinct from my beloved lz4 on OpenZFS. It has no business giving such great results in such a short time.

To illustrate, here’s a disk image I exported from a retired Xen hypervisor onto my FreeBSD tower. I like this because it has a nice mix of textual and binary data:

$ ls -l disk.qcow2
==> 17381195776

This is about 16 GiB. Let’s use lzop with its default -3 compression level:

$ time lzop -v disk.qcow2
==> compressing disk.qcow2 into disk.qcow2.lzo
==>    0m41.73s real     0m29.72s user     0m09.77s system
$ ls -l image.raw.lzo
==> 12246333887

For our purposes here, we’ll go by the rough “wall clock” time using real, which is about 40 seconds. That’s wickedly fast to get the file down to about 11 GiB! This is also what makes it perfect for piping a dd block copy over ssh, because I know I’ll saturate my network connection long before the CPU on either end.

Decompression is similarly impressive:

$ time lzop -x disk.qcow2.lzo
==> 0m35.54s real     0m25.89s user     0m08.11s system

On the other side we have plzip, a multi-threaded implementation of lzip by Antonio Diaz Diaz. I use this when compression ratio is paramount, such as for long-term archiving or when trying to fit on a specific-sized disk.

Here it is working on the same image, using the default -6 compression level:

$ time plzip disk.qcow2
==> 17m12.81s real   170m04.58s user     1m17.09s system
$ ls -l disk.qcow2.lz
==> 8470705508

It got the disk down to 7.9 GiB, more than 3.0 GiB more than lzop was able to! But it took 17 minutes of real time, and more than 170 cumulative user minutes across my CPU cores.

Decompression is a different story, with the original file being returned in about 3 minutes. This makes it a good candidate for distributing compressed files:

$ time plzip -d disk.qcow2.lz
==> 3m02.95s real    22m47.74s user     1m13.96s system

I love running silly, entirely unscientific tests like this, but they also serve to illustrate a point. People starting in this industry will often choose the “best” solution based on one specific metric, but often times it comes down to what priority you have for a given task. Your choice of compression can have a huge impact on a given solution, and optimising for one metric over another may work well, or could bite you in the posterior.

Strategically deploying the right tool for the job is as much an art as a science, and is one of the things I enjoy the most about my job.


Villa del Parque station in Buenos Aires

Thoughts

It may surprise a few of you to know I have a thing for train stations. This one on the San Martín Line is lovely.

Photot showing the single-floor station from the street.

The Tudor styling reminds me of the old Harajuku station in Tokyo. I’m relieved Clara and I were able to explore it before they had to knock it down.

Thanks to JonySniuk for posting this on Wikimedia Commons.


You should check out Marian Bouček’s blog

Thoughts

Marian Bouček has started a blog, and you should read it and subscribe!

I just finished this post about moving from Apple to FreeBSD, which I’m sure you wouldn’t be surprised spoke to me, especially of late.

I remember years ago one of those self-anointed writing experts saying blogs should be limited to one topic, but my favourites are ones that aren’t. It’s a joy being drawn into a site based on a topic I know about, then discovering a whole other world I didn’t know. For example, Marian rides horses, and has a beautiful post about it.


Starting a blog, and unoriginal ideas

Thoughts

Alexey Guzey laid out a powerful case:

Summary: in this post I explain why you should start a blog (to help others and to help yourself), what to write about, and how to start it. I hope to persuade you that you should start a blog even if you feel that you have nothing to say and even if almost nobody will read it.

The post is chock full of great advice, links, and writing prompts.

But that wasn’t the most useful part for me. In 2020 I talked about how uncomfortable I was sharing a few opinions with unpleasant people:

I’ve grappled with this idea that I can agree with people with whom I otherwise don’t, and even those I consider reprehensible. How does one reconcile this?

Jim Kloss of Whole Wheat Radio replied saying there isn’t an original thought in the universe, and that we’re the sum of what we’ve absorbed and lived. Rebecca Hales noted that even stopped clocks are right twice a day.

Alexey takes this one step further to say that your ideas are more original than you think. He quotes Gleb Posobin:

Your own ideas mostly seem trivial to you because you have the right concept structures in place to support them. You wouldn’t come up with these ideas otherwise. So it’s easier to notice your own ideas in a dialogue: your friend has different concept structures and notices them.

But even if you don’t buy that, or are firmly trapped in imposter syndrome, Alexey asserts that unoriginal ideas are themselves useful:

Does this post contain a single original idea? I don’t think it does. Is it useful? Well, yeah, it is. But this brings me to a more general idea…

Unoriginal writing is useful because because it helps in the process of discovery and in the process of supporting underappreciated ideas.

He likens it to a university professor giving a lecture. We don’t assume lecturers have original ideas or content, yet we consider them valuable and necessary. I hadn’t considered that.

Also, you should definitely start a blog. Then let me know, so I can add you to the list! I still recommend Wordpress.com because its relatively easy to use, and they make it easy to export if and when you need to.


The “fundamental problem” with NFTs

Internet

Reuters on the 11th of February:

The platform which sold an NFT of Jack Dorsey’s first tweet for $2.9 million has halted most transactions because people were selling tokens of content that did not belong to them, its founder said, calling this a “fundamental problem” in the fast-growing digital assets market.

That’s the understatement of the century! But it gets better:

The biggest NFT marketplace [redacted], valued at $13.3 billion after its latest round of venture funding, said last month more than 80% of the NFTs minted for free on its platform were “plagiarized works, fake collections and spam”.

Who’d have thunk it!? Next they’ll be telling me my hats look silly. Wait, shaddup.


UI design is as much about expectations

Internet

A few of you might remember years ago a blog design I wrote to mimic an email inbox. I had a column of category “folders” on the left, a column of recent posts in the middle, and the post itself in the right.

The feedback I got from people was mostly negative, ranging from it being confusing or difficult to use, and the implementation not working that well across multiple viewports. The latter could be solved with a better understanding of responsive design, but it’s the former I’m interested in.

Recently I came across a site implementing a similar UI. I was bamboozled, just like my readers were back then. I assumed I’d stumbled on someone’s private email server; it wasn’t until I clicked around that I got my bearings, and translated that mental model into that of a personal site with posts and bookmarks.

Not to critique that site specifically, but in general such mismatches are the perfect demonstration of two related axioms:

  • the principle of least surprise, and
  • context is king

Save for games and certain art projects, software and websites shouldn’t surprise people. If you want your words read, goods bought, or tasks completed, you want to reduce cognitive overhead. Danny Halarewich’s 2016 article is still the best summary I’ve found on this topic:

If the user experience design does what it’s supposed to do, the user won’t notice any of the work that went into it. The less users have to think about the interface or design, the more they can focus on accomplishing their goal on your website.

The biggest culprit in cognitive overload is confusing UIs. The user should never have to spend long figuring out how to complete the action they want, nor waste brain power deciphering an icon.

Likewise, context explains why we find UIs applied to alternate settings confusing, even if we understand their original implementation. We may grok car steering wheels and bicycles, but merging the two creates a new, foreign interface that nullifies people’s experience and training.

Personal websites like ours get more of a pass, because they’re as much a living demonstration of technical and design ability. But by the same token, they’re an expression of one’s values. I try and keep things simple and predictable because I value that when I see it on other blogs, and it’s jarring when I encounter ones that aren’t.

In the legendary words of Rob Pike, clear is better than clever. I wish that were emphasised more in modern webdev.


Fire preparedness, and the SCDF emergency handbook

Thoughts

For all the damage and risk to life, at least something good has come from the Telok Blangah Rise fire in Singapore earlier this month. I’m seeing more discussion of fire safety again, which is a topic that’s often difficult to get people engaged with.

Channel NewsAsia Davina Tham wrote one of the better articles I’ve seen discussing what you can do to prepare your apartment for fire, including not leaving cooking unattended, not keeping flammable liquds near heat sources, and only using well-tested electrical appliances. We should all have fire alarms, dry chemical fire extinguishes, clear exit pathways, and do our own regular fire drills.

From my industrial fire training, the only thing I’d add to this advice is to only use fire extinguishers to clear pathways, or to help rescue others if you’re confident of your own escape. They’re not magic bullets, and burning possessions aren’t worth risking your life if its engulfing a room.

The Singapore Civil Defence Force’s Emergency Handbook has a ton of useful stuff, including preventing and handling fires. I’ve literally had versions of this PDF on every smartphone and tablet I’ve ever owned, even when I’ve lived elsewhere.

Like a bushfire in a detached home, apartment fires can just as easily spread from elsewhere, so it’s important not to be complacent even if you’ve personally done the right thing.


Using someone’s childhood to conclude anything

Thoughts

I was reading a thread on social media (please write blog posts instead) that attempted to deconstruct a politician’s stance on a certain issue.

The writer reached back to the politician’s childhood, as every historical documentary and analysis does, and discovered the person’s torturous childhood relationship with their father, and the emotional distance they felt from their mother. Their conclusion, as with all hindsight, was that it was a contributing factor to the politician’s xenophobia and Brexit worldview.

This superficial, groundless pop psychology is rife, precisely because you can infer anything you want and have it fit your narrative.

Let’s look at some examples. I’ve seen people’s comfortable, supportive upbringing used to explain why they turned out:

  • compassionate, kind, and honest, because their parents were to them

  • entitled, because they were used to getting what they want

  • can’t handle real-world pressure, because they were shielded

On the flipside, someone’s rough upbringing is used to explain why they’re:

  • xenophobic, vindictive, or cold, because their parents were to them

  • compassionate, kind, and honest, because they have lived experience with how a nasty environment feels

  • emotionally detached, or emotionally aware

Take anyone’s circumstances, and you can explain it with their past. Two people had the same upbringing and did completely different things? Not a problem!

I can’t unsee this now each time I read a news article, or read hot take on social media. Unless the person is a psychologist with an intimate knowledge of the person’s circumstances (and if so, wouldn’t that be private?), I wouldn’t take what they said as anything meaningful. Hey, like this blog!


A 12 TB SATA Ultrastar hard drive saves the day

Hardware

This is less of a review of an excellent hard drive, and more a meandering story about how we got here. Last September I talked about a defective 12 TB Ironwolf drive that I was shipped after buying it online. Half a year later, and we finally have a happy ending!

The drive of doom

To recap, Clara and I needed more drive space in our homelab server for backups. Western Digital had earned my trust and respect after years of flawless service, but their SMR shenanigans made me look past backup provider reports, and try Seagate again for once.

The end result was worse than the original post let on. After the drive was shipped back to the retailer, who then shipped it to Seagate, we got a report back confirming the drive was faulty. The retailer sent us a new drive, and it arrived… almost two months later.

I attached this replacement drive to our homelab server, and it exhibited almost the exact same symptoms: random clicking sounds, a regular loud beep, and the BIOS refusing to detect it. This time the Supermicro board even made beeping sounds which indicated an error on one of the SATA ports. Swapping out the SATA cables, and plugging it into a different machine made no difference.

I shipped it back again, but by this point it was Christmas and Covid cases were surging again in Australia. I harbour no ill will to any person in this chain during such a crunch, but we still went for weeks without nay a peep from either the retailer, or Seagate. When I finally opened a new ticket a month later, they offered me the choice of a replacement unit, but I went with a refund.

By this point almost six months had past, and the cost of drives had come down to the point where I could get a 12 TB Ultrastar for an equivalent price to the original Ironwolf! This one arrived within a few days from Mwave this time, and it’s been resilvering and scrubbing in our OpenZFS array since. I was super impressed with how well the Mwave team packaged it for shipping, which I’m sure contributed to this success.

  pool: zwork
 state: ONLINE
  scan: resilvered 6.53T in 09:15:07 with 0 errors on Sun Feb 13 03:32:25 2022
config:
  NAME                             STATE     READ WRITE CKSUM
  zwork                            ONLINE       0     0     0
    mirror-0                       ONLINE       0     0     0
      gpt/12TB-WDRedPro-XXXXXXXX   ONLINE       0     0     0
      gpt/12TB-Ultrastar-XXXXXXXX  ONLINE       0     0     0

Quick review, and lessons

As I said at the start, the 12 TiB SATA Ultrastar is a fantastic drive. Clara thought it sounded like “bubbling water pipes” when seeking, instead of the high-pitched clicking and whirring sounds you typically expect from a drive. It is a bit loud in our tiny apartment, but it’s not obnoxious. And it works!

I learned some lessons here. People assume that you can rebuild RAID arrays or ZFS pools by simply swapping out defective drives, but that might not always be tenable in a timely fashion in a residential setting. I built a ghastly workaround to maintain redundancy (involving striping some old drives and using USB) once I realised a replacement wasn’t coming any time soon. If you’re on a budget, or dealing with expensive 12+ TB drives, it might be worth working on such contingency plans.

I strongly suspect based on the identical manufacturing dates and symptoms that the Ironwolfs (Ironwolves?) I was delivered were from the same batch, so it’d be unfair to characterise the entire line as being more likely to die. For all I know, the pallet they were shipped on might have been bumped, which is no fault of the manufacturer or retailer. If that’s the case though, I’m sure that means plenty of others were bitten from this batch, so I hope they’re all being recalled or written off, and not relying on every single person reporting it.

The experience did validate my long-running practice of mixing drive manufacturers and SKUs so that a fault in one batch doesn’t affect the other drive in a mirror or RAID. Some people shun this practice, claiming you’ll get differing performance characteristics even if you try and match caches and rotation speeds, but reliability is more important to me.

I’ve got a post pending where I quantify my experience with various drives over the last decade. But like all anecdotes limited to a dozen or so data points, it’d be disingenuous and illogical to assume there’s any meaningful information to glean. The fact my experience broadly tracks with those famous Backblaze reports is probably coincidence.

Having said all that, this and other experiences have me leaning WD if the choice came up, and especially those by the former HGST if I can. From my own albeit limit experience with a dozen or so drives, they’re just better.


Flight or invisibility, with Clara!

Travel

Clara’s company runs a weekly games afternoon for team building, which looks like fun. It was her turn to come up with an activity, so she gave each person a hypothetical to explore. This was one of them:

Would you prefer the ability of flight, or invisibility?

I asked Clara, who’s siting opposite from me right now, to write a response.

While we wait, I’d easily prefer to fly. That surprises me, because as an introvert I’d love to fade into the background sometimes most of the time and be left alone. But being able to see the world from that perspective, and travel with ease would be a boon.

Clara’s done now:

I would definitely prefer to turn invisible, to avoid embarrassing situations and having to look in the mirror! Besides, if you could fly imagine ending up in restricted airspace, and the pigeon strikes!

Nooooo! smacks into pigeon. 🐦