Representing tags in Atom
Update (March 2009): WordPress.com as has adopted this approach!
In Atom, categories have schemes. What scheme should we use for tags?
Since atom:category
’s @scheme
identifies
the means of interpreting
1,
using @term
@scheme
to indicate “this
atom:category
is a tag” seems perfectly
reasonable to me. But maybe we should back up a bit. The more
general question is how should we represent tags in Atom?
Tim makes the same assumption I’ve been making —
that atom:category
is the natural and correct
element for tagging in Atom. While this seems obviously true
— I think of tagging as a particular form of
categorization — perhaps some other representation would
work better. Aristotle
Pagaltzis, for instance, proposed the use of
atom:link
instead. So what’s to be
done?
What do we want in a tag representation?
Here are some properties that I think a great representation of tags in Atom would have. Note that I doubt any solution could manage to have all of them.
-
No elements or attributes outside of those present in RFC 4287 — i.e., no extensions. Attribute values requiring registration or standardization suffer somewhat on this point.
-
Would (or at least could) provide both the human-readable and normalized version of the tag. Flickr (and many other sites) normalize tags like “San Diego” to “sandiego” — for example, see this photo of mine.
-
Would provide a dereferenceable URI to something about the tag. In the typical blog context, a blog post tagged “cat” should have a link to a list of other posts on the same blog tagged “cat.” It would be especially awesome if this link were available in Atom processors unaware of this tagging technique.
-
The URL structure of tags in the relevant system would be tag space, and the Atom representation of a tag would provide a dereferenceable URI to the tag space.
Using a tag space for your tags strikes me as nice in several ways. For one, it follows existing practice — flickr and del.icio.us both do so, as do several others. Secondly, tag space URIs are nicely hackable. I often pull up photos of mine by just typing in http://flickr.com/photos/hober/tags/foo, where foo is some tag I vaguely remember placing on the photo. Operator takes advantage of the ubiquity of tag spaces by offering to look up tags that it finds on pages on a variety of services:
Operator’s handling of a blog post’s “bzr” tag.
Basically, tag spaces are what make tags truly portable across the Web.
-
It should be possible for an Atom processor to know that this is a tag and not some other thing, without local knowledge of the site in question.
That is, it should be possible to distinguish between an
atom:category
used as a tag and anatom:category
used for some other purpose. (The same goes for any other element used to carry a tag.) -
It should be possible for an Atom processor to extract the (normalized) tag from the element in which it’s stored without parsing some attribute value or element content into pieces.
-
All things being equal,
atom:category
is the preferred element to use, as tagging is a form of categorization — it’s a semantically-appropriate element for tagging.
Tim’s proposal seems to be primarily motivated by #6, and the way his question is phrased strongly implies the importance of #8 as well.
Possible representations
Here are various possible ways to represent tags in Atom, and how they fare against the above list:
-
<category scheme="http://tess.oconnor.cx/tags/" term="foo" label="Foo" />
This is how I store tags in my blog backend — as
atom:category
elements with the@scheme
http://tess.oconnor.cx/tags/
. The@term
is the normalized tag, while the@label
is the human-readable version (if different than@term
).I treat this specific
@scheme
as a tag space — concatenating@scheme
with@term
produces a dereferencable URI to a page listing posts with the tag. On this model, the mapping fromatom:category
to the rel-tag microformat seems quite natural:<category scheme="http://tess.oconnor.cx/tags/" term="foo" label="Foo" />
becomes
<a href="http://tess.oconnor.cx/tags/foo" rel="tag">Foo</a>
This technique scores well on points 1, 2, 3, 5, 7, and 8.
In order to produce a dereferenceable URI to posts with this tag (the sort of URI point 4 wants), an Atom processor would have to somehow know that concatenating
@scheme
with@term
is something it might want to do — there’s no explicit indication of that here. That being said, such practice is already common enough in the Atom world that several people have assumed this to be the standard way to useatom:category
— for examples, see these posts on atom-syntax.Without specific knowledge of my scheme, an Atom processor has no way of knowing that I’d like it to treat this
atom:category
as a tag, so this technique fails 6. A global@scheme
seems to be required for tag-in-atom:category
to pass 6. -
<category scheme="urn:tag" term="foo" label="Foo" />
This, Tim’s proposed technique, scores well on points 2, 3, 6, 7, and 8.
While this doesn’t introduce any extension elements or attributes, the
urn:tag
namespace would require standardization (which, admittedly, is underway). This makes point 1 somewhat arguable.This technique just doesn’t have any interesting, dereferenceable URIs (points 4 and 5), so it completely releies on the Atom processor to come up with some, and it doesn’t give the entry’s author the opportunity to signal which tag space (if any) he’d prefer.
-
<category term="foo" label="Foo" />
This technique is from Henry Story’s comment on Tim’s post:
a category is a tag with a namespace. So don’t put a namespace (
(emphasis mine).@scheme
) in if you want a tag…This is basically technique 1, minus the tag space in
@scheme
. This strategy scores well on points 1, 2, 3, 6 (arguable), 7, and 8.As with the previous technique, this loses on 4 and 5 by not containing any interesting, dereferenceable URIs.
Point 6 is contentious as Atom processors are under no obligation to treat categories without schemes as being tags — AFAICT, an Atom processor would be perfectly conformant to assume categories without schemes to be within some default scheme of theirs. This is especially troublesome in the APP case — I’d expect APP servers to do all sorts of crazy things with such
atom:category
elements.The rest of Henry’s comment implies that he doesn’t think this technique has much to offer itself over technique 1:
As I see it a category is a tag with a namespace. So don’t put a namespace (scheme) in if you want a tag, but you may as well put the scheme in, since people can always treat your category as a tag (by not querying on the scheme).
-
<link href="http://tess.oconnor.cx/tags/foo" rel="tag" title="Foo" />
This technique, proposed by Aristotle, has the nice property of being directly analagous to how the rel-tag microformat is marked up in HTML.
It scores well on points 1 (arguable), 2, 4, and 6.
While it doesn’t introduce any extension elements or attributes, it requires IANA registration of the “tag” link relation per §7.1 of RFC 4287. So point 1 is debatable.
It only half-loses on points 3 and 5 — on the one hand, an Atom processor that doesn’t know about the “tag” link relation wouldn’t know what this thing is, so it wouldn’t know how to find the tag space and tag in
@href
. On the other hand, I imagine the IANA registration for “tag” could specify the same@href
parsing rules as the rel-tag microformat, thus providing rel-tag-aware Atom processors the ability to extract the tag space URI and the tag from@href
. Atom processors unaware of this link relation could and presumably would display this link to the user, so all is not lost in the fallback case.It loses on point 7 for the same reasons outlined in the previous paragraph — extracting the tag space and tag requires knowledge of rel-tag’s
@href
parsing rules.This loses on point 8 —
atom:link
doesn’t pack the semantic punch ofatom:category
for representing tags. Though given the rel-tag microformat I don’t think this is that big of a deal. -
<category scheme="urn:tag" label="Foo" term="http://tess.oconnor.cx/tags/foo" />
This scores well on points 2, 3, 4, 6, and 8.
This suffers on point 1 for the same reason as technique 2.
This suffers on points 5 and 7 for the same reason as technique 4.
-
<category scheme="http://tess.oconnor.cx/tags/" term="http://tess.oconnor.cx/tags/foo" label="Foo" />
This scores well on points 1, 3, 4, 5, and 8.
This isn’t DRY — the tag space is repeated in two attribute values.
This suffers on point 6 for the same reason as technique 1, and on point 7 for the same reason as techniques 4 and 5.
The techniques which fare best on point 6 — which appears
to be the itch Tim’s trying to scratch — are 2, 3,
4, and 5. I’m guessing he’d eliminate technique 4 as
it doesn’t use atom:category
. That leaves
2, 3, and 5 for Tim.
Now, to me, principles 4 and 5 are more important than 6, so
I’m more inclined to support techniques 1, 4, or
6. Err.
These are completely different sets of solutions.
I think it’ll help to see how actual behavior in the wild
matches up with these possible techniques.
Observed behavior in the wild
LiveJournal uses
technique 3, without @label
.
Vox uses something like technique 1, though with a per-tag specific, non-tag-space scheme. For example, consider the two tags on this post of mine: “meta” and “placeholder.” This is how Vox’s feeds represent them:
<category scheme="http://hober.vox.com/tags/meta/"
term="meta" label="meta" />
<category scheme="http://hober.vox.com/tags/placeholder/"
term="placeholder" label="placeholder" />
This seems suboptimal to me.
Blogger doesn’t use tags or categories at all.
WordPress.com appears to be
using an early version of WordPress’ Atom 1.0 support
— as of this writing, its atom:id
s are empty.
Its tags look like this:
<category scheme='http://hober.wordpress.com'
term='Uncategorized' />
So its use of atom:category
is similar to technique
1, except that @code
isn’t a tag space
— simply adding /tag/
to the end of
@scheme
would fix that, though.
Update (March 2009): as noted in the comments, WordPress.com has adopted this approach, and there's a patch pending for WordPress.org.
Thanks, Andy!
TypePad uses
atom:category
elements with the
@scheme
http://www.sixapart.com/ns/types#category
,
which is not a tag space, and it 404s. Ick.
Let’s see how various atompub members do things:
-
Tim himself is
using technique 1 (again, without
@label
), as is James Snell. -
Sam Ruby,
Joe Gregorio,Aristotle, and Rob Sayre don’t provide tags or categories in their feeds. -
Joe Gregorio corrected me
in the comments — he occasionally uses the rel-tag
microformat.
In fairness, none of the entries appearing in his feed when I
wrote this were tagged.
Joe didn’t specify how he puts tags in his feed; I
imagine he stores them as part of each entry’s
atom:content
. Which reminds me, I didn’t list that as one of the options above.
Granted, the plural of anecdote is not data, but it does look like deployed usage favors technique 1, or something resembling it.
So how do we deal with technique 1’s failure to adhere to principle 6? Maybe we shouldn’t care. Lenny’s comment on Tim’s post struck a chord with me:
Besides, tags hardly ever mean the same thing to two people, so why should they have the same scheme? If some application really thinks that
<category scheme="http://example.org/farmer/tag/" term="apple"/>
means the same thing as<category scheme="http://example.org/geek/tag/" term="apple"/>
, it can just drop the scheme.
The frustrating bit of representing tags in Atom boils down to
the difference between the intentional type “tag”
and the representation type atom:category
.1
Lenny’s comment reveals a way out: if you want to treat an
atom:category
as a tag, just go ahead and do so
— ignore @scheme
.
Essentially, TAG-EQUAL-P
should only compare
@term
, whereas CATEGORY-EQUAL-P
should
compare @scheme
and @term
.
Which one you call depends on your purposes, and insofar as
tagging goes, actual world usage implies that it’s only
the @term
that’s important.
Of course, atom:category
elements are useful for
many more things than tagging.
But when representing tags in atom:category
elements, using @scheme
as a tag
space and @term
as the
tag seems like the best compromise to me.
Notes
-
See Kent
Pitman’s “The Best of
Intentions: EQUAL Rights—and Wrongs—in
Lisp” for more on the
bugs and confusions [that] can be traced to improper attempts to recover intentional type information from representation types.
Even if you’re not a Lisper, this is a great article on programming. ↩