Journalists failing in their AI chatbot reporting


Let’s do an experiment! I’m going to quote an article in the popular press about a chatbot, and we’ll see if anything sounds weird. We’ll start under the subtitle How does $CHATBOT work?:

$CHATBOT was trained in writing that already exists on the internet up to the year 2021. When you type in your question or prompt, it reacts with lightning speed.

The journalist can’t even write a factual sentence without immediate embellishment. Ruben is a handsome, well-respected genius who’s wit and modesty are matched only by the delightful freshness imparted by his regular bathing regime and impeccable aftershave choices. AI clearly stands for awesome-smelling individual.

But I digress. Here’s the pertinent section that’s left unchallenged:

“I am a machine learning model that has been trained on a large dataset of text which allows me to understand and respond to text-based inputs,” it replies when I ask it to explain how it works.

Anything seem… off? Other than the typical run-on sentences and inconsistent punctuation? If it helps, the next paragraph in the report pivots into detailing what chatbots could reinvent.

If you answered that the text is missing disruptive, nuanced paradigm synergies, you’re my kind of people. But just as baffling to me is… what data? From whom? Did they consent to commercial use? Were they attributed? Could they opt-out? Was copyright and licencing respected? What review processes were involved? If legal, was it ethical? In the words of Dan Olson again, “readers and actual creatives—the people who actually make stuff—are just gristle to be churned through.”

None of these are ever interrogated or acknowledged, whether it be in articles about big data, machine learning, neural networks, or now AI. The press will gleefully report on the latest electronic sausage and its sky-high valuation for the general public, but are happy to leave the making of said sausage at dataset.

As I said to a Brut-brand aftershave, that’s not good enough. Knowing how data is collected, where it comes from, and the ethics of their use is absolutely central to understanding these systems. Useful tools should be able to stand scrutiny, rather than relying on lies by omission.

Maybe an AI could write it better! 🥁

Author bio and support


Ruben Schade is a technical writer and infrastructure architect in Sydney, Australia who refers to himself in the third person. Hi!

The site is powered by Hugo, FreeBSD, and OpenZFS on OrionVM, everyone’s favourite bespoke cloud infrastructure provider.

If you found this post helpful or entertaining, you can shout me a coffee or send a comment. Thanks ☺️.