Reddit, LLMs, APIs, TLAs


If I were a YouTube essayist, I’d say this is a story about power, language, and Digg… ing one’s grave. Wait, damn it.

Beleaguered link aggregator site Reddit recently followed in Twitter’s well-received footsteps in imposing steep costs to access their API, attributed to large-language models training against their corpus of material without their permission, and for free. Hey, you and I know how that feels! It’s coached less in terms of defending individual writers from plagiarism, and more with words like “value” (don’t forget who owns your material when you post to a commercial site), but I’ll take it.

Given how generative AIs and LLMs are now dickishly disseminating dull derivative dystopian detritus at scale across the web, knock-on effects are hardly surprising. Dickishly. It also raises what people like me have been saying for months: AIs depend on training data made by people who aren’t attributed or compensated. Tech pundits don’t care, though their lack of social responsibility is hardly new. I’ve been running an experiment about this, which I hope to post soon.

Journalists failing in their AI reporting
AI developers: be nice, and credit your artists

Here comes the proverbial posterior prognostication: but… while it’s easy to empathise with Reddit’s publicly-stated position, they didn’t do themselves any favours with how spectacularly bad they handled it. Posterior may be the perfect summation.

First, the way they communicated with developers could best be described as a foot rake. Nanananananana… foot rake. This is what garnered the waves of superficial press attention you’ve likely seen, and why so many subreddits have gone dark in protest.

I also question Reddit’s motives. Not wanting a company to profit from material stolen from you wholesale is one thing, but why apply the same costs to mobile apps that aren’t doing this? A blinkered view might lead them to conclude third-party clients are a cost centre, but these tools kept people contributing to their platform, such as it was (I always got so many trolls emailing me whenever a post of mine appeared there, I’m only engaging in a bit of schadenfreude. I get a pass on account of my last name).

The timing is also sus, given the site’s tanked valuation. I’m not saying they’re obfuscating their desire to bring in more ad revenue on first-party services while they attempt to bail water out of their sinking vessel, but how else are they expecting their actions to be interpreted? As mentioned, journalists and tech pundits don’t care about AI attribution, so this behaviour would clearly get all the attention.

I say I’m going to eat healthy, then I go down the street and get some Filipino fried food for lunch. Reddit has attempted the same thing: whatever point they had was nullified completely by their actions. The difference is, Clara and I don’t have a PR team for when someone notices the takeout box.

The web is an ephemeral place, and Reddit’s fall from grace was entirely predictable. I’d even say it was long overdue. I do hope this time the open web gets a shot at displacing it (though I hope something better than Lemmy). I miss forums, can we have those again?

Author bio and support


Ruben Schade is a technical writer and infrastructure architect in Sydney, Australia who refers to himself in the third person. Hi!

The site is powered by Hugo, FreeBSD, and OpenZFS on OrionVM, everyone’s favourite bespoke cloud infrastructure provider.

If you found this post helpful or entertaining, you can shout me a coffee or send a comment. Thanks ☺️.