Microdata, Microformats, and RDF

Recently, Ian Hickson announced a new microdata section of the HTML5 spec. Microdata provide a method of annotating HTML content for scripted data extraction.

Microformats and microdata

Microformats allow authors to mark up events, contact information, etc. in a machine-extractable manner, within the constraints of conforming HTML 4 or XHTML 1. They do so by using the language extension mechanisms present in HTML 4: head@profile, @class, meta@name, @rel, and the like.

As I’ve said elsewhere:

Microformats are a great example of a community coming together and taking advantage of HTML’s existing extensibility points[…] microformats thrive within the constraints of HTML’s existing extensibility points.

Ian touched upon this in the WaSP interview:

Microformats [are] natively supported in HTML5, just like [they were] in HTML 4, because Microformats use the built-in extension mechanisms of HTML.

HTML5’s microdata proposal isn’t some kind of competing way to mark up such data, it’s a change to the underlying language extension mechanisms. In the future, when microformats are defined on top of HTML5, they will be able to take advantage of microdata attributes (@item, @itemprop, and the like), its unambiguous data extraction algorithm, as well as other new bits of HTML5 (e.g. the <time> element).

I look forward to future revisions of hCard, hCalendar, etc. built on top of HTML5’s microdata. I expect HTML5’s predefined vocabularies—if/when extracted from the main HTML5 spec—will provide the basis for such reformulations.

RDF and microdata

HTML5 contains the definition of an algorithm for extracting RDF triples from any HTML document, including any microdata items present. Microdata allow for the typing of items by URL, and thus allow authors to express many RDF triples natively in HTML.

RDFa and microdata

RDFa is a way to embed RDF into XML vocabularies. Unlike microformats (and, for that matter, unlike eRDF), it was never designed to work within the constraints of HTML 4. Instead, it was designed as a set of new XML attributes that could be used within XHTML2 documents, and then was back-ported to XHTML 1. Since it wasn’t designed with the HTML Design Principles in mind, it should come as no surprise that RDFa violates several of them, and so isn’t suitable for inclusion in the Web platform. As Ian put it in his WaSP interview:

We considered RDFa long and hard[…], but at the end of the day, while some people really like it, I don’t think it strikes the right balance between power and ease of authoring. For example, it uses namespaces and prefixes, which by and large confuse authors to no end.

Despite RDFa’s deficiencies, it would still be a good thing if implementors like Google and Yahoo had an unambiguous specification of how to process it in the wild. I very much hope that the RDFa community embraces Philip’s effort along such lines.