Tim Bray asks (links mine):
In Atom, categories have schemes. What scheme should we use for tags?
the means of interpreting 1,
@scheme to indicate “this
atom:category is a tag” seems perfectly
reasonable to me. But maybe we should back up a bit. The more
general question is
how should we represent tags in Atom?
Tim makes the same assumption I’ve been making —
atom:category is the natural and correct
element for tagging in Atom. While this seems obviously true
— I think of tagging as a particular form of
categorization — perhaps some other representation would
work better. Aristotle
Pagaltzis, for instance, proposed the use of
atom:link instead. So what’s to be
What do we want in a tag representation?
Here are some properties that I think a great representation of tags in Atom would have. Note that I doubt any solution could manage to have all of them.
No elements or attributes outside of those present in RFC 4287 — i.e., no extensions. Attribute values requiring registration or standardization suffer somewhat on this point.
Would (or at least could) provide both the human-readable and normalized version of the tag. Flickr (and many other sites) normalize tags like “San Diego” to “sandiego” — for example, see this photo of mine.
Would provide a dereferenceable URI to something about the tag. In the typical blog context, a blog post tagged “cat” should have a link to a list of other posts on the same blog tagged “cat.” It would be especially awesome if this link were available in Atom processors unaware of this tagging technique.
The URL structure of tags in the relevant system would be tag space, and the Atom representation of a tag would provide a dereferenceable URI to the tag space.
Using a tag space for your tags strikes me as nice in several ways. For one, it follows existing practice — flickr and del.icio.us both do so, as do several others. Secondly, tag space URIs are nicely hackable. I often pull up photos of mine by just typing in http://flickr.com/photos/hober/tags/foo, where foo is some tag I vaguely remember placing on the photo. Operator takes advantage of the ubiquity of tag spaces by offering to look up tags that it finds on pages on a variety of services:
Basically, tag spaces are what make tags truly portable across the Web.
It should be possible for an Atom processor to know that this is a tag and not some other thing, without local knowledge of the site in question.
That is, it should be possible to distinguish between an
atom:categoryused as a tag and an
atom:categoryused for some other purpose. (The same goes for any other element used to carry a tag.)
It should be possible for an Atom processor to extract the (normalized) tag from the element in which it’s stored without parsing some attribute value or element content into pieces.
All things being equal,
atom:categoryis the preferred element to use, as tagging is a form of categorization — it’s a semantically-appropriate element for tagging.
Tim’s proposal seems to be primarily motivated by #6, and the way his question is phrased strongly implies the importance of #8 as well.
Here are various possible ways to represent tags in Atom, and how they fare against the above list:
<category scheme="http://tess.oconnor.cx/tags/" term="foo" label="Foo" />
This is how I store tags in my blog backend — as
atom:categoryelements with the
@termis the normalized tag, while the
@labelis the human-readable version (if different than
I treat this specific
@schemeas a tag space — concatenating
@termproduces a dereferencable URI to a page listing posts with the tag. On this model, the mapping from
atom:categoryto the rel-tag microformat seems quite natural:
<category scheme="http://tess.oconnor.cx/tags/" term="foo" label="Foo" />
<a href="http://tess.oconnor.cx/tags/foo" rel="tag">Foo</a>
This technique scores well on points 1, 2, 3, 5, 7, and 8.
In order to produce a dereferenceable URI to posts with this tag (the sort of URI point 4 wants), an Atom processor would have to somehow know that concatenating
@termis something it might want to do — there’s no explicit indication of that here. That being said, such practice is already common enough in the Atom world that several people have assumed this to be the standard way to use
atom:category— for examples, see these posts on atom-syntax.
Without specific knowledge of my scheme, an Atom processor has no way of knowing that I’d like it to treat this
atom:categoryas a tag, so this technique fails 6. A global
@schemeseems to be required for tag-in-
atom:categoryto pass 6.
<category scheme="urn:tag" term="foo" label="Foo" />
This, Tim’s proposed technique, scores well on points 2, 3, 6, 7, and 8.
While this doesn’t introduce any extension elements or attributes, the
urn:tagnamespace would require standardization (which, admittedly, is underway). This makes point 1 somewhat arguable.
This technique just doesn’t have any interesting, dereferenceable URIs (points 4 and 5), so it completely releies on the Atom processor to come up with some, and it doesn’t give the entry’s author the opportunity to signal which tag space (if any) he’d prefer.
<category term="foo" label="Foo" />
This technique is from Henry Story’s comment on Tim’s post:
a category is a tag with a namespace. So don’t put a namespace ((emphasis mine).
@scheme) in if you want a tag…
This is basically technique 1, minus the tag space in
@scheme. This strategy scores well on points 1, 2, 3, 6 (arguable), 7, and 8.
As with the previous technique, this loses on 4 and 5 by not containing any interesting, dereferenceable URIs.
Point 6 is contentious as Atom processors are under no obligation to treat categories without schemes as being tags — AFAICT, an Atom processor would be perfectly conformant to assume categories without schemes to be within some default scheme of theirs. This is especially troublesome in the APP case — I’d expect APP servers to do all sorts of crazy things with such
The rest of Henry’s comment implies that he doesn’t think this technique has much to offer itself over technique 1:
As I see it a category is a tag with a namespace. So don’t put a namespace (scheme) in if you want a tag, but you may as well put the scheme in, since people can always treat your category as a tag (by not querying on the scheme).
<link href="http://tess.oconnor.cx/tags/foo" rel="tag" title="Foo" />
This technique, proposed by Aristotle, has the nice property of being directly analagous to how the rel-tag microformat is marked up in HTML.
It scores well on points 1 (arguable), 2, 4, and 6.
While it doesn’t introduce any extension elements or attributes, it requires IANA registration of the “tag” link relation per §7.1 of RFC 4287. So point 1 is debatable.
It only half-loses on points 3 and 5 — on the one hand, an Atom processor that doesn’t know about the “tag” link relation wouldn’t know what this thing is, so it wouldn’t know how to find the tag space and tag in
@href. On the other hand, I imagine the IANA registration for “tag” could specify the same
@hrefparsing rules as the rel-tag microformat, thus providing rel-tag-aware Atom processors the ability to extract the tag space URI and the tag from
@href. Atom processors unaware of this link relation could and presumably would display this link to the user, so all is not lost in the fallback case.
It loses on point 7 for the same reasons outlined in the previous paragraph — extracting the tag space and tag requires knowledge of rel-tag’s
This loses on point 8 —
atom:linkdoesn’t pack the semantic punch of
atom:categoryfor representing tags. Though given the rel-tag microformat I don’t think this is that big of a deal.
<category scheme="urn:tag" label="Foo" term="http://tess.oconnor.cx/tags/foo" />
This scores well on points 2, 3, 4, 6, and 8.
This suffers on point 1 for the same reason as technique 2.
This suffers on points 5 and 7 for the same reason as technique 4.
<category scheme="http://tess.oconnor.cx/tags/" term="http://tess.oconnor.cx/tags/foo" label="Foo" />
This scores well on points 1, 3, 4, 5, and 8.
This isn’t DRY — the tag space is repeated in two attribute values.
This suffers on point 6 for the same reason as technique 1, and on point 7 for the same reason as techniques 4 and 5.
The techniques which fare best on point 6 — which appears
to be the itch Tim’s trying to scratch — are 2, 3,
4, and 5. I’m guessing he’d eliminate technique 4 as
it doesn’t use
atom:category. That leaves
2, 3, and 5 for Tim.
Now, to me, principles 4 and 5 are more important than 6, so
I’m more inclined to support techniques 1, 4, or
These are completely different sets of solutions.
I think it’ll help to see how actual behavior in the wild
matches up with these possible techniques.
Observed behavior in the wild
technique 3, without
Vox uses something like technique 1, though with a per-tag specific, non-tag-space scheme. For example, consider the two tags on this post of mine: “meta” and “placeholder.” This is how Vox’s feeds represent them:
<category scheme="http://hober.vox.com/tags/meta/" term="meta" label="meta" /> <category scheme="http://hober.vox.com/tags/placeholder/" term="placeholder" label="placeholder" />
This seems suboptimal to me.
Blogger doesn’t use tags or categories at all.
WordPress.com appears to be
using an early version of WordPress’ Atom 1.0 support
— as of this writing, its
atom:ids are empty.
Its tags look like this:
<category scheme='http://hober.wordpress.com' term='Uncategorized' />
So its use of
atom:category is similar to technique
1, except that
@code isn’t a tag space
— simply adding
/tag/ to the end of
@scheme would fix that, though.
atom:category elements with the
which is not a tag space, and it 404s. Ick.
Let’s see how various atompub members do things:
Tim himself is
using technique 1 (again, without
@label), as is James Snell.
Joe Gregorio,Aristotle, and Rob Sayre don’t provide tags or categories in their feeds.
Granted, the plural of anecdote is not data, but it does look like deployed usage favors technique 1, or something resembling it.
So how do we deal with technique 1’s failure to adhere to principle 6? Maybe we shouldn’t care. Lenny’s comment on Tim’s post struck a chord with me:
Besides, tags hardly ever mean the same thing to two people, so why should they have the same scheme? If some application really thinks that
<category scheme="http://example.org/farmer/tag/" term="apple"/>means the same thing as
<category scheme="http://example.org/geek/tag/" term="apple"/>, it can just drop the scheme.
The frustrating bit of representing tags in Atom boils down to
the difference between the intentional type “tag”
and the representation type
Lenny’s comment reveals a way out: if you want to treat an
atom:category as a tag, just go ahead and do so
TAG-EQUAL-P should only compare
Which one you call depends on your purposes, and insofar as
tagging goes, actual world usage implies that it’s only
@term that’s important.
atom:category elements are useful for
many more things than tagging.
But when representing tags in
@scheme as a tag
@term as the
tag seems like the best compromise to me.
Pitman’s “The Best of
Intentions: EQUAL Rights—and Wrongs—in
Lisp” for more on the
bugs and confusions [that] can be traced to improper attempts to recover intentional type information from representation types.Even if you’re not a Lisper, this is a great article on programming. ↩