Rubenerd

Skip to content
By Ruben Schade in s/Singapore/Sydney/. 🌻

Home About Archives Podcast RSS Omake

Feedback on duplicate RSS dates

Saturday 26 June 2021 Internet

Last Monday I wrote a post exploring the RSS Validator’s claim that RSS 2.0’s pubDate and Dublin Core’s dc:date elements were considered duplicates. I asserted that they overlapped, but had different semantic meaning and precision.

The feedback from both gentleman concerned this section:

Which leads us to the justification for removing dc:date, so as not to “confuse news aggregators”. As someone who maintains and builds aggregators, I don’t buy this. I wouldn’t think anything introduced with namespaces would take precedence over a mandatory element like pubDate.

Geoff (not the railfan Geoff from the last post!) chimed in:

I agree with your comments. Compatibility concerns between [feed formats] were always overblown (and lead to that delicious irony of Atom). It’s very easy to write a decision tree based on the presented version of RSS or Atom to weight which date to respect (pubDate in RSS 0.9x and 2.0, RDF like your Dublin Core example in RSS 1.0).

The “delicious irony of Atom” referred to an earlier conversation we had where a new incompatible format was introduced to address incompatibilities. I actually appreciate what Atom set out to do, but there’s a well-worn xkcd that shows the practical reality.

Hales of Haelstrom (feed here) cautioned that while “I could probably ignore the warning about duplicate date elements”, they can introduce an issue with determining and handling updates:

All of these choices [about updating an article when a feed changes] hinge on being able to identify that the new edited article as being “the same” or not as the old article. You need a unique ID attached to the article for this to work, otherwise it’s a guessing game.

I’ve omitted how I find the “date” in the first place, but it’s a similar list of “Look for thing 1, if that fails looks for thing 2, if that fails …”. In my case I fall back to using the “title” of the article before I use the “date”, but that’s not necessarily the best thing to do if you want to avoid duplicate articles (and so other feed readers/parsers probably won’t do this).

This ties in with my Perl post last month about moving to Perl’s XML::LibXML. I commented that I prefer using a general XML parser over specific packages for RSS, Atom, or OPML, but didn’t give more detail why. Aside from needing to only learn how one package works, it’s so that I can handle edge cases like what Hales describes. Turns out that even among packages in the same language, they handle updates, identifiers, dates, and other nomenclature subtly differently. I prefer pulling in data from whichever format is presented, and handling the dates and other data in the same data structure. I like to think Perl’s hashes and syntactic sugar are especially suited to this, but that’s for another post.

Or as Hales summarised, “welcome to RSS!” It’s funny how I find these sorts of things challenging but ultimately fun and rewarding, as opposed to getting the blessing of a specific API written by a large social network that wants to own the Internet and control our social graphs. Walled gardens are pretty, as long as you follow their [ever changing] rules and terms. The open web is messy.


Author bio and support

Me!

Ruben Schade is a technical writer and IaaS engineer in Sydney, Australia who refers to himself in the third person in bios. Wait, not BIOS… my brain should be EFI by now.

The site is powered by Hugo, FreeBSD, and OpenZFS on OrionVM, everyone’s favourite cloud infrastructure provider.

If you found this post helpful or entertaining, you can shout me a coffee or buy some silly merch. Thanks!


Newer post ← Revisiting webcam covers
Older post → British Rail’s Pacer trains