Theresa O’Connor

Marking up RFC 2119 text in HTML

I’ve updated my proposal; see “Revisiting RFC 2119 markup” after reading the below.

Specifications from the IETF and other organizations often make use of RFC 2119’s language for expressing requirements. To use RFC 2119, authors… should incorporate this phrase near the beginning of their document:

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

(This is called RFC 2119 boilerplate.)

After including the boilerplate, specifications make a bunch of statements—normative, informative, and definitional—utilizing RFC 2119 vocabulary for the normative parts.

Recently, on the microformats-new mailing list, Dr. Orlovsky asked about creating a microformat for marking up RFC 2119 terms. Scott Reynen thinks that creating a microformat for this would probably be overkill, and that simply authoring your spec as POSHly as possible should suffice. I agree.

So let’s try to figure out what nice, semantic markup for RFC 2119 text should look like. Our goal is two-fold:

  1. to mark up the boilerplate text quoted above;
  2. to mark up each instance where we use an RFC 2119 word.

Starting with 1, here’s the bare minimum: I’ve wrapped the boilerplate inside a <p> element, and linked to the RFC:

<p>
  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
  NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
  "OPTIONAL" in this document are to be interpreted as described in
  <a href="http://www.ietf.org/rfc/rfc2119.txt">RFC 2119</a>.
</p>

That link is begging to be souped up with some link relations. HTML 4 defines several link relations, two of which seem relevant in this case:

Glossary
Refers to a document providing a glossary of terms that pertain to the current document.
Help
Refers to a document offering help (more information, links to other sources information, etc.)

Of the two, glossary most closely captures the semantic we want, so let’s use it. Note, though, HTML 5 has help but has dropped glossary. (That is, the current version of the draft lacks it—it may include it before publication) Thus, I’ve placed both glossary and help on the link.

<p>
  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
  NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
  "OPTIONAL" in this document are to be interpreted as described in
  <a href="http://www.ietf.org/rfc/rfc2119.txt"
     rel="help glossary">RFC 2119</a>.
</p>

Next, we should mark up each RFC 2119 term in the boilerplate. We want to say, “look to this paragraph for the meaning of this term.” Brian Suda suggested the use of the <dfn> element, which HTML 4.01 provides for precisely this purpose:

DFN:
Indicates that this is the defining instance of the enclosed term.

HTML 4 gives us very little guidance with regard to <dfn>; the above quote is actually all it has to say on the matter! So how are we to know what the definition for the term is? In Brian’s post, he copied the definitions from RFC 2119 and placed them into title attributes on their <dfn> elements. This is a perfectly reasonable thing to do, given HTML 4, but I think it’s worrisome for two reasons:

  1. I’d like the definitions themselves to only exist in RFC 2119. When we invoke RFC 2119, we turn over the meanings of these terms to it; semantically, the definitions should be external. DRY and all that.

    Also, by duplicating its definitions, we run the (admittedly very small) chance of bit-rot.

  2. While HTML 4 didn’t provide much in the way of useful material on <dfn>, the HTML 5 certainly does. HTML 5 defines an algoritm for determining what the term is and what the definition is. Brian’s use of dfn/@title breaks under the HTML 5 algorithm: If the title attribute of the dfn element is present, then it must only contain the term being defined.

    I think it’s worth striving to be both forwards- and backwards-compatible, so putting the definition in dfn/@title seems problematic.

Here’s where we are now:

<p>
  The key words "<dfn>MUST</dfn>", "<dfn>MUST NOT</dfn>",
  "<dfn>REQUIRED</dfn>", "<dfn>SHALL</dfn>", "<dfn>SHALL NOT</dfn>",
  "<dfn>SHOULD</dfn>", "<dfn>SHOULD NOT</dfn>",
  "<dfn>RECOMMENDED</dfn>", "<dfn>MAY</dfn>", and "<dfn>OPTIONAL</dfn>"
  in this document are to be interpreted as described in <a
  href="http://www.ietf.org/rfc/rfc2119.txt"
  rel="help glossary">RFC 2119</a>.
</p>

Let’s move on to goal 2: how should we mark up the individual instances of RFC 2119 terms that appear elsewhere in the document?

Here’s how HTML 5 associates terms with <dfn> elements:

Any span, abbr, code, var, samp, or i element that has a non-empty title attribute whose value exactly equals the term of a dfn element in the same document, or which has no title attribute but whose textContent exactly equals the term of a dfn element in the document, and that has no interactive elements or dfn elements either as ancestors or descendants, and has no other elements as ancestors that are themselves matching these conditions, should be presented in such a way that the user can jump from the element to the first dfn element giving the defining instance of that term.

Out of those possibilities (the <span>, <abbr>, <code>, <var>, <samp>, or <i> elements), only <span> and <i> are semantically compatible with what we’re trying to do.

I think strong would be best for RFC 2119 terms, and have suggested it be added to this list in an email to the WHATWG list. I’ve updated my proposal to use em; see “Revisiting RFC 2119 markup.”

If browsers supported HTML 5’s term/definition association algorithm, we’d be done now. However, it sounds like it’d be pretty hard to support the algorithm as specced, so this is likely to change before publication. Let’s help things out a bit with a touch of @class. A nice semantic class name that fits our needs is defined. I’ve written up an XMDP for it.

So, summing up, here’s what POSH RFC 2119 use looks like:

<p>
  The key words "<dfn>MUST</dfn>", "<dfn>MUST NOT</dfn>",
  "<dfn>REQUIRED</dfn>", "<dfn>SHALL</dfn>", "<dfn>SHALL NOT</dfn>",
  "<dfn>SHOULD</dfn>", "<dfn>SHOULD NOT</dfn>",
  "<dfn>RECOMMENDED</dfn>", "<dfn>MAY</dfn>", and "<dfn>OPTIONAL</dfn>"
  in this document are to be interpreted as described in <a
  href="http://www.ietf.org/rfc/rfc2119.txt"
  rel="help glossary">RFC 2119</a>.
</p><p>
  … The frob <strong class="defined">MUST</strong> be frobnicated
  vigorously until done. …
</p>