Hardware Unboxed’s podcast, and benchmark misconceptions

Hardware

I’m enjoying the new Hardware Unboxed podcast. You can subscribe using the link below, or you can watch on YouTube:

YouTube channel
Podcast RSS feed

I thought they raised a good point about benchmarking:

I don’t want to make it sound like its rocket science, but benchmarking today is significantly more complex and difficult than it used to be. I don’t mean that “it’s very difficult to run a benchmark” … I mean making sure that the test system is up to date and operating as it should be.

Before you even go to test a GPU, you need to make sure the system itself is behaving as it should be, so it doesn’t throw off your comparative data […] Does a new patch introduce a new kind of bug or problem?

There are so many things you have to look into now, from configuring the system, to making sure its giving you the numbers you need, to making sure the [software] is configured the way it needs to be.

There’s so much work you need to do just to make sure and validate you can actually start benchmarking.

This was in the context of graphics cards, though it broadly applies to other hardware, networks, and software as well. Heck, even kitchen appliances, camera gear, you name it.

The core issue with benchmarks is expectations. People without a technical background, or those who have enough to be dangerous, intuitively expect that one device can be meaningfully compared to another, often times with a simple number. Meaningfully being the operative word.

This is a computer, and that’s a computer, so which one is faster and therefore “better”? It’s almost never that easy.

As I mentioned on Mastodon, this is already one positive thing I’m seeing come out of the Linus Tech Tips fallout. I think more enthusiasts appreciate now how good benchmarks are difficult. There’s a science to approaching tests, from:

  • Selecting appropriate parameters
  • Sourcing the appropriate equipment and software
  • Designing thorough and reproducible testing methods
  • Controlling for externalities (as best you can)
  • Understanding what’s meaningful or an outlier
  • Figuring out clean states to perform the next tests

And most importantly of all, asking if the test itself is useful to the target audience, and presented in a way that accurately describes what the test demonstrated.

A large part of my $DayJob involves comparing prices and performance between various providers for clients, and between different infrastructure types. One of the first things you realise is how many people place blind trust a single number, as though it’s complete, accurate, relevant, timely, and actionable. Aka, lies and statistics. Half the battle isn’t even doing the test, it’s explaining why one set of benchmarks is erroneous, or might not be doing what they think, or doesn’t account for something, or has lead them to an error in judgement.

Unfortunately, at least in my experience, too many people interpret such cautions as obfuscation, as though you’re trying to fudge or hide the truth by buttressing a result with a wall of caveats. The more truthful you get, the more asterisks you add, which marketing has taught people to distrust.

The best a benchmarker can do is be transparent about their process, and be open to correcting mistakes; whether it be a graphics card or coffee maker. Anything less is where real dishonesty lies.

Get it… dishonesty lies? Shaddup.

Author bio and support

Me!

Ruben Schade is a technical writer and infrastructure architect in Sydney, Australia who refers to himself in the third person. Hi!

The site is powered by Hugo, FreeBSD, and OpenZFS on OrionVM, everyone’s favourite bespoke cloud infrastructure provider.

If you found this post helpful or entertaining, you can shout me a coffee or send a comment. Thanks ☺️.