The rise of online readability scrapers


There are a new breed of services coming out that purport to make the modern web less frustrating to use in specific circumstances. But they’ve incurred the wrath of creators in doing so, and don’t address the structural issues for why we’re at this point.

A recipe site scraper was the most recent and publicised example. Its developers claimed the tool removed superfluous paragraphs of text surrounding actual cooking instructions, based on the perception that recipe sites are mostly filler. In the social media space, “unroll” services present long Twitter threads on a single page, making them as easy to read as a blog.

Both of these types of services address a real need people online have, for better or worse. I love reading about the history of a family recipe, but there are far more people who think the padding is only there to serve more ads. Likewise, as long as people insist on using Twitter’s threads feature instead of linking to a blog post, unroll services render them more accessible.

The idea of tools stripping out complexity and redundant content isn’t new. Marco Arment’s Instapaper didn’t just save pages for later use, it removed everything except the content of a blog post or news article. Services like Mozilla’s Pocket do the same; even Apple’s Safari has a Readability mode. Opera’s mobile web browsers used to proxy content on your behalf and optimise it for tiny data plans.

Why I think these specific tools rub people the wrong way is that the resulting pages are publicly accessible, not just for personal use. This means there’s an unsanctioned, unauthorised version of their work elsewhere, bereft of their monetisation or social capital. This feels like wholesale theft.

I’m not qualified to discuss the legalities of scrapers like this. Ethically though, I think it’s clearer cut. It’s not cool to take people’s content like this. If you don’t like scrolling past the history of Aunt Jessie’s apple pies to get to a recipe, there are other sites out there you can use. Or you can buy a cookbook! As much as I hate Twitter threads, and think unroll services help to raise awareness of just how bad they are, you’re still publishing someone else’s words in full without their permission.

And really, once again, we come back to the core thing I’ve been talking about here for years. The web feels like its trending backwards in usability and privacy because writers feel this is the only way they can monetise their content. Until this is addressed, writing and site designs will continue to be optimised for ads, not readers. I wish half as much effort was being spent on figuring out this problem, rather than treating the symptoms!

Author bio and support


Ruben Schade is a technical writer and infrastructure architect in Sydney, Australia who refers to himself in the third person in bios. Hi!

The site is powered by Hugo, FreeBSD, and OpenZFS on OrionVM, everyone’s favourite bespoke cloud infrastructure provider.

If you found this post helpful or entertaining, you can shout me a coffee or send a comment. Thanks ☺️.