Tag spaces
While chatting with Anne the
other day I casually
asserted that you obtain a tag from <a rel by extracting the final path segment from its
href attribute. Consider this HTML fragment:
When he got <a href
rel=tag I thought the spec said, somewhere,
that "Defenestration" is the tag here (and not, say, "defenestrated").
But that’s not the case. Neither of those is the case.
What
the spec actually says is that the
entire URL is the tag:
"https://". I doubt this matches anybody’s mental model of how
rel works.
Later on I remembered that this is an aspect of the rel
microformat that didn’t make the cut when
Ian
incorporated rel directly
into HTML. Here’s
what the
microformat says:
The last path component of the URL is the text of the tag, so
<a href="http://technorati .com/ tag/ tech" rel = "tag">fish</a> would indicate the tag "tech" rather than "fish".
So I remembered correctly that this was written down somewhere, but I
was mistaken about where. Mystery solved. But this keeps bouncing around
my head and I think it may be worth mulling over how
the rel microformat envisioned this would
work, and what’s nice about that.
What the rel=tag microformat defines
The rel=tag microformat defines
a tag
space as
a place that
collates or defines tags
. The rest of the URL
("http://" in their
example, and
"https://" in
mine) is the space in which the tag is defined. The author could put
some other tag at the end of that URL and have a reasonable
expectation that the resulting page would work. Of course, this means
you can use other domains’ URL spaces as tag spaces:
Authors may choose to link to a tag at a particular tag space in order to provide a specific meaning. E.g. a tag for technology could link to "
http://".en .wikipedia .org/ wiki/ Technology
I’ve made good use of this flexibility on my own website, and not just in the wholesale abuse of Wikipedia for this purpose. For instance, consider the permalink at the top of this page:
<a rel="bookmark tag" href=/2026/05/tag-spaces>Tag spaces</a>
I wanted to tag this page with the tag "tag-spaces", and
I also used "tag-spaces" as the slug of this post. So I can
put rel on the same link that’s also the
page’s permalink (rel). I don’t need a
separate link to "/tags/tag-spaces" or whatever somewhere
else on the page.
Styling pages by tag
Assuming you use rel the way the
microformat envisions, it’s really easy to style pages differently
depending on which tags are present on them: simply use a combination of
the
[att~=val]
and
[att$=val]
selectors.
For example, here’s a CSS variable that contains an image of the six-color Pride flag:
html {
--pride-flag: linear-gradient(#e40303 16.66%, #FF8D00 0 33.33%, #FFEE00 0 50%, #028121 0 66.66%, #004CFF 0 83.33%, #770088 0);
}
Here’s an example page that uses this as a background for the header.
Let’s say we load this stylesheet on a bunch of LGBTQ-related pages, and we’d like to customize it depending on the focus of the page. Here’s how we could change it to the orange-pink lesbian flag on pages tagged "lesbian" (example):
html:has([rel~=tag][href$="/lesbian"]) {
--pride-flag: linear-gradient(#d52d00 0 14.28%, #ef7627 0 28.57%, #ff9a56 0 42.85%, #fff 0 57.14%, #d162a4 0 71.42%, #b55690 0 85.71%, #a30262 0);
}
Here’s how you’d switch it to the transgender flag on pages talking about trans stuff (example):
html:has([rel~=tag][href$="/trans"]) {
--pride-flag: linear-gradient(#5bcefa 20%, #f5a9b8 0 40%, white 0 60%, #f5a9b8 0 80%, #5bcefa 0);
}
Atom categories
The way the rel=tag microformat breaks
URLs up into parts rhymes with
RFC 4287’s
<atom:
element—the element used to categorize posts in an Atom feed.
The Atom spec defines three attributes for
<atom:: scheme, term,
and label. Only term is
required. The scheme attribute takes
a URI, while the others
accept text.
<category scheme="https://
<atom:category> elementStrictly speaking, the Atom spec doesn’t actually
ascribe any particular semantics to these attributes—it doesn’t say you
can concatenate scheme and term and dereference the result.
extractTagParts()
Nevertheless, I’ve long applied these terms from Atom to the
components of the rel microformat. The
term is the last path segment of the URL, the scheme is the rest of the URL, and the label is the text of the element:
When he got <a href
rel=tag microformat to <atom:category>This is the approach I take when I generate my Atom feed from my site’s HTML.
- Extracting a
schemeout of a URL is very easy once you’ve got theterm. It’s just everything else:Extract scheme from url using term url.href.substring(0, url.href.length-term.length) - To get
termitself, you need to split the URL’spathnameon U+002F SOLIDUS (/), and grab the last part. (We’re usingurlhere because we want to avoid any fragments or query parameters that may be present.).pathname Extract term from url url.pathname .split("/") .pop() - The element’s
textis usually suitable forContent label. Sometimes I explicitly override it by providing atitleattribute. If we can’t find a suitablelabel, we can at least default toterm.Extract label from element (element.hasAttribute("title") ? element.get Attribute("title") : element.text Content ) || term
Putting this all together, we get this simple JavaScript function:
function extractTagParts(element) {
const url = new URL(element.href);
Ignore trailing slashes in url
Remove extraneous information from url
const term = Extract term from url;
const label = Extract label from element;
const scheme = Extract scheme from url using term;
// Let’s also return the whole, original URL.
const raw = element.href;
return { term, label, scheme, raw };
}
Example usage
Here’s some JavaScript that calls extractTagParts() on
this page’s permalink and drops the result into
an <output> element:
And here’s the result:
So, okay, it works. Let’s do something slightly more interesting. Let’s find all the tags on the page and present them as a nice sorted list:
The result is “.”
Note that I don’t have to care where in the page the tags are. They
don’thave to be in some dedicated metadata section. I can
sprinkle rel onto any links within the page
whose URLs have the appropriate form, regardless of what
website they link to.
Other fiddly details
Strictly speaking, I left out a couple of details in the above text.
Trailing slashes
The rel microformat says to ignore a
trailing slash:
Trailing slashes in tag URLs are ignored, that is:
http://technorati .com/ tag/ Technology/ as a rel-tag URL is treated as:
http://technorati .com/ tag/ Technology
This is, of course, trivial to do:
if (url.pathname.endsWith("/"))
url.pathname = url.pathname.slice(0, -1);
Fragments and query parameters
We play fast and loose with url.href when
we extract scheme
from url using term. What if
the URL has a fragment or some query parameters? Naively
substringing will not do what you expect if either are present. So we
zero out both of these URL parts before trying to take a
substring of url.href:
url.search = '';
url.hash = '';
Let’s make sure that works. Here’s a test.
It happens that I’ve traveled to Japan a bunch, and I’ve written about that on my travel page. Now, the first link in this paragraph, that goes straight to the Japan part of my travel page, looks like this:
<a href=/travel#japan title=travel>traveled to Japan</a>
When we run extractTagParts() on it, we get this:
Looks like it works! 😊