Marking up code listings
I included several HTML and XML code samples in my writeup of how to mark up captioned figures, so I thought I should also explain how I mark up code listings.
The straightforward way to mark up source code in semantic HTML
is by employing a compound of <pre>
and <code>
which ends up looking like this:
<pre><code>
…source code goes here…
</code></pre>
Aside: It should be noted that the
existence of this HTML compound is why proposals for a code
microformat such as Anders
Conbere’s hCode get shot down — the answer to is
there a compound of XHTML elements that would work?
(one of
the questions
to ask before proceeding in the microformats
process) is an emphatic “yes.”
Now, this compound is pretty minimal — we know that the text inside is source code, but we don’t know anything interesting about it. Let’s see about enhancing it.
For my first baby-step past the basics, I indicate the
code’s language by putting a class onto the <code>
element:
<pre><code class="python">
…python source code goes here…
</code></pre>
There are several JavaScript-based automatic syntax highlighers
— such as Dan
Webb’s CodeHighlighter
— which operate directly on such language-labeled <pre><code>
blocks. A really simple
addition to the basic HTML compound can get you quite far. But
say you want to handle syntax highlighting yourself — what
should you do?
Syntax highlighting
Take this simple snippet of JavaScript:
var foo = 4;
We can introduce basic syntax highlighting of variables by using
the <var>
element:
<pre><code class="javascript">
var <var>foo</var> = 4;
</code></pre>
This is about as far as we can get without introducing our own semantics via custom classes.
So how should we choose what to highlight? What should we name the classes we create? As Jon Williams noted, our technique should take into account the wide variety of languages we might want to mark up — programming languages are but one such variety.
This is where Emacs comes in.
Emacs already knows how to syntax highlight pretty much every
language I’d like to post snippets of, and its mechanism for
doing so — font lock — maps disparate language
features onto reasonably semantically-named font
lock faces: builtin
, comment
,
constant
, function
,
keyword
, string
, type
,
and variable
, to name the key ones. These are
pretty good names to crib, and we can observe what names Emacs
attaches to different parts of code too.
To illustrate, here’s an example pulled from my ~/.cshrc
:
set complete=enhance
set ssh_hosts = `grep '^Host[ ][^*]' ~/.ssh/config | cut -c 6-`
complete ssh 'p/1/$ssh_hosts/'
# Aliases for pulling up a screen on each host
foreach host ($ssh_hosts)
alias $host "ssh -t $host screen -DR"
end
Here’s how you might mark that up:
<pre><code class="csh"><span class="builtin">set</span> <var>complete</var>=enhance
<span class="builtin">set</span> <var>ssh_hosts</var> = <span class="string">`grep '^Host[ ][^*]' ~/.ssh/config | cut -c 6-`</span>
complete ssh <span class="string">'p/1/$ssh_hosts/'</span>
<span class="comment"># Aliases for pulling up a screen on each host</span>
<span class="keyword">foreach</span> host ($<var>ssh_hosts</var>)
<span class="builtin">alias</span> $<var>host</var> <span class="string">"ssh -t $host screen -DR"</span>
<span class="keyword">end</span></code></pre>
Not only is Emacs a decent source of guidance on how to mark code up, it can also do most of the markup-writing heavy lifting for us. There are several tools of varying quality for automatically converting font locked Emacs buffers into equivalent HTML [1, 2, and 3], but I rolled something myself in about 50 lines of Emacs Lisp that Works For Me. I just select a region in some buffer and hit a keystroke: a marked-up version of the region gets dropped right into the clipboard for easy pasting.
The colors I use are from my Emacs color theme, color-theme-hober2.el.
CSS rules derived from color-theme-hober2.el.
pre code .keyword { color: #4682b4; }
pre code .type { color: #3cb371; }
pre code .function { color: #5f9ea0; }
pre code var { color: #ff6a6a; }
pre code .string { color: #fffacd; }
pre code .comment { color: #9932cc; }
pre code .preprocessor { color: #f0e68c; }
pre code .constant { color: #db7093; }
pre code .builtin { color: #f4a460; }
I’m hoping to write up a companion post over at Emacsen.org in which I detail the actual mechanics of the code, but I’m sufficiently busy these days that I doubt I’ll get to it in the near future.