Cleaning up metadata, and building with Dublin Core
InternetThis might be a bit Inside Baseball as my American friends say, but many of you have messaged over the years saying my site metadata was a useful template for your own. I may think too much about it, which means now I feel the compulsion to share some changes.
I got rid of most of it.
I’ve erred on the side of too much metadata for at least a decade. I’ve implemented Dublin Core, Schema, Open Graph, Twitter, and microformats, all in the one polyglot template with metatags, RDFa, JSON-LD, and Microdata. It taught me a lot about semantic triples [sic], data structures, and how to wrestle multiple incompatible formats into something resembling useful (and valid!) markup.
But was all this code worth it? Or to put another way, was I getting value over constantly fretting about all this data, whether attributes across different schemas were as interchangeable as I thought, and whether I was tracking the changes social networks made to their formats?
Around the time I started getting an inkling it was all getting a bit silly, I read this post Wouter wrote on Brain Baking about Twitter’s proprietary card metadata, and Infinite Love’s comment that HTML metadata tags should be more than sufficient. I found myself agreeing with both.
The other shoe dropped when I took stock of just how much space this metadata took up. On a typical 200-400 word post like the one you’re reading, metadata from these overlapping schemas took up more than 80% of the page. Yes I had embedded jokes, I hadn’t merged as many attributes as I probably could have, my JSON wasn’t minified, and it still only represented kilobytes of text in the grand scheme of things. But stripping it back to the essentials made me realise I could bin most of it while still expressing what I needed to.
But what schema to keep? After mulling whether to just use HTML metadata and call it a day, I decided to stick with the Dublin Core Metadata Initiative. As an archivist and aspirational librarian, their metadata cause has always spoken the most to me, and I still think they offer the most flexible and broadly useful schemas for all sorts of data, not least a personal blog. It also fits in well with existing HTML metadata tags, so there isn’t much more markup to write.
I’m not sure if I’ll take a search engine ranking it, and I know I’ll lose things like Twitter cards. But all the same metadata is there that was before, so any sufficiently motivated actor could use it. The site takes half as much time to generate, loads a bit quicker, and I think the source is eminently more readable. Those are things that matter to me, so I’m happy.