Skip to main content

Theresa O’Connor / Treasa Ní Chonchúir

Tag spaces

While chatting with Anne the other day casually asserted that you obtain a tag from <a rel=tag href> by extracting the final path segment from its href attribute. Consider this HTML fragment:

When he got <a href=https://en.wikipedia.org/wiki/De­fen­e­stra­tion rel=tag>de­fen­e­stra­ted</a>, he was fortunate to land in a bale of hay that happened to be in a wheelbarrow on the sidewalk below.

A link marked rel=tag

I thought the spec said, somewhere, that "De­fen­e­stra­tion" is the tag here (and not, say, "de­fen­e­stra­ted"). But that’s not the case. Neither of those is the case. What the spec actually says is that the entire URL is the tag: "https://en.wikipedia.org/wiki/De­fen­e­stra­tion". I doubt this matches anybody’s mental model of how rel=tag works.

Later on I remembered that this is an aspect of that didn’t make the cut when Ian incorporated rel=tag directly into HTML. Here’s what the mi­cro­for­mat says:

The last path component of the URL is the text of the tag, so

<a href="http://technorati.com/tag/tech" rel="tag">f‌ish</a>

would indicate the tag "tech" rather than "fish".

So I remembered correctly that this was written down somewhere, but I was mistaken about where. Mystery solved. But this keeps bouncing around my head and I think it may be worth mulling over how the rel=tag mi­cro­for­mat envisioned this would work, and what’s nice about that.

What the rel=tag microformat defines

The rel=tag mi­cro­for­mat defines a tag space as a place that collates or defines tags. The rest of the URL ("http://technorati.com/tag/" in their example, and "https://en.wikipedia.org/wiki/" in mine) is the space in which the tag is defined. The author could put some other tag at the end of that URL and have a reasonable expectation that the resulting page would work. Of course, this means you can use other domains’ URL spaces as tag spaces:

Authors may choose to link to a tag at a particular tag space in order to provide a specific meaning. E.g. a tag for technology could link to "http://en.wikipedia.org/wiki/Technology".

I’ve made good use of this flexibility on my own website, and not just in the wholesale abuse of Wikipedia for this purpose. For instance, consider the permalink at the top of this page:

I wanted to tag this page with the tag "tag-spaces", and I also used "tag-spaces" as the slug of this post. So I can put rel=tag on the same link that’s also the page’s permalink (rel=bookmark). I don’t need a separate link to "/tags/tag-spaces" or whatever somewhere else on the page.

Styling pages by tag

Assuming you use rel=tag the way the microformat envisions, it’s really easy to style pages differently depending on which tags are present on them: simply use a combination of the [att~=val] and [att$=val] selectors.

For example, here’s a CSS variable that contains an image of the six-color Pride flag:

html {
    --pride-flag: linear-gradient(#e40303 16.66%, #FF8D00 0 33.33%, #FFEE00 0 50%, #028121 0 66.66%, #004CFF 0 83.33%, #770088 0);
}
The six-color variant of Gilbert Baker’s Pride flag

Here’s an example page that uses this as a background for the header.

Let’s say we load this stylesheet on a bunch of LGBTQ-related pages, and we’d like to customize it depending on the focus of the page. Here’s how we could change it to the orange-pink lesbian flag on pages tagged "lesbian" (example):

html:has([rel~=tag][href$="/lesbian"]) {
    --pride-flag: linear-gradient(#d52d00 0 14.28%, #ef7627 0 28.57%, #ff9a56 0 42.85%, #fff 0 57.14%, #d162a4 0 71.42%, #b55690 0 85.71%, #a30262 0);
}
Emily Gwen’s “Sunset” lesbian pride flag

Here’s how you’d switch it to the transgender flag on pages talking about trans stuff (example):

html:has([rel~=tag][href$="/trans"]) {
    --pride-flag: linear-gradient(#5bcefa 20%, #f5a9b8 0 40%, white 0 60%, #f5a9b8 0 80%, #5bcefa 0);
}
Monica Helms’ transgender pride flag

Atom categories

The way the rel=tag mi­cro­for­mat breaks URLs up into parts rhymes with RFC 4287’s <atom:category> element—the element used to categorize posts in an Atom feed.

The Atom spec defines three attributes for <atom:category>: scheme, term, and label. Only term is required. The scheme attribute takes a URI, while the others accept text.

<category scheme="https://en.wikipedia.org/wiki/" term="De­fen­e­stra­tion" label="de­fen­e­stra­ted"/>

An <atom:category> element

Strictly speaking, the Atom spec doesn’t actually ascribe any particular semantics to these attributes—it doesn’t say you can concatenate scheme and term and dereference the result.

extractTagParts()

Nevertheless, I’ve long applied these terms from Atom to the components of the rel=tag mi­cro­for­mat. The term is the last path segment of the URL, the scheme is the rest of the URL, and the label is the textContent of the element:

When he got <a href=https://en.wikipedia.org/wiki/De­fen­e­stra­tion rel=tag>de­fen­e­stra­ted</a>, he was fortunate to land in a bale of hay that happened to be in a wheelbarrow on the sidewalk below.

Mapping parts of the rel=tag mi­cro­for­mat to <atom:category>

This is the approach I take when I generate my Atom feed from my site’s HTML.

  1. Extracting a scheme out of a URL is very easy once you’ve got the term. It’s just everything else:
    Extract scheme from url using term
    url.href.substring(0, url.href.length-term.length)
  2. To get term itself, you need to split the URL’s pathname on U+002F SOLIDUS (/), and grab the last part. (We’re using url.pathname here because we want to avoid any fragments or query parameters that may be present.)
    Extract term from url
    url.pathname.split("/").pop()
  3. The element’s textContent is usually suitable for label. Sometimes I explicitly override it by providing a title attribute. If we can’t find a suitable label, we can at least default to term.
    Extract label from element
    (element.hasAttribute("title") ? element.getAttribute("title") : element.textContent) || term

Putting this all together, we get this simple JavaScript function:

Extract scheme, term, and label from a link
function extractTagParts(element) {
    const url = new URL(element.href);
    Ignore trailing slashes in url
    Remove extraneous information from url

    const term = Extract term from url;
    const label = Extract label from element;
    const scheme = Extract scheme from url using term;
    // Let’s also return the whole, original URL.
    const raw = element.href;

    return { term, label, scheme, raw };
}

Example usage

Here’s some JavaScript that calls extractTagParts() on this page’s permalink and drops the result into an <output> element:

Extract tag parts from this page’s permalink.

And here’s the result:

So, okay, it works. Let’s do something slightly more interesting. Let’s find all the tags on the page and present them as a nice sorted list:

Make a human-readable list of this page’s tags.

The result is “.”

Note that I don’t have to care where in the page the tags are. They don’thave to be in some dedicated metadata section. I can sprinkle rel=tag onto any links within the page whose URLs have the appropriate form, regardless of what website they link to.


Other fiddly details

Strictly speaking, I left out a couple of details in the above text.

Trailing slashes

The rel=tag mi­cro­for­mat says to ignore a trailing slash:

Trailing slashes in tag URLs are ignored, that is:

http://technorati.com/tag/Technology/

as a rel-tag URL is treated as:

http://technorati.com/tag/Technology

This is, of course, trivial to do:

Ignore trailing slashes in url
if (url.pathname.endsWith("/"))
    url.pathname = url.pathname.slice(0, -1);

Fragments and query parameters

We play fast and loose with url.href when we extract scheme from url using term. What if the URL has a fragment or some query parameters? Naively substringing will not do what you expect if either are present. So we zero out both of these URL parts before trying to take a substring of url.href:

Remove extraneous information from url
url.search = '';
url.hash = '';

Let’s make sure that works. Here’s a test.

It happens that I’ve traveled to Japan a bunch, and I’ve written about that on my travel page. Now, the first link in this paragraph, that goes straight to the Japan part of my travel page, looks like this:

<a href=/travel#japan title=travel>traveled to Japan</a>
The markup of the “traveled to Japan” link.

When we run extractTagParts() on it, we get this:

Looks like it works! 😊