T.V. Raman said the Web is more than a Web Browser, and that Web technology means more than just HTML,1 and I couldn’t agree more. Yet he goes on to characterize the HTML5 effort like so:

The HTML5 community would define themselves as encompassing all Web technologies, i.e., if it’s not HTML5 and implemented in a browser, it’s not the Web.

I wanted to write a bit about this disconnect.

What is the Web platform anyway? Honestly, it’s a weird thing: part designed, part congealed; part documented, part reverse-engineered. It consists of the technology broadly used to process the public content of the Web. Many tools that live outside of browsers are built on top of this platform, but the important part of calling a technology a piece of the Web platform is that it works with the public content of the Web.

The Web platform includes many technologies we’re at this point all familiar with: URLs, HTTP, REST, HTML, CSS, JavaScript, DOM, etc… It includes lots of pieces users never see, like website APIs you can call into with XHR (or with your favorite programming language’s HTTP library), and some data formats like XML and JSON, which aren’t user-facing.

The Web platform’s core is JavaScript, the DOM, and CSS, built on top of HTTP, URLs, and the Internet

Henri Sivonen’s second pass at a diagram of the web stack.

Tools intended to operate on public Web content usually need to handle that content the way that browsers do, because browsers are the tools that authors target their content for. (This is the Support Existing Content design principle.) If a browser handles unquoted attribute values, say, it behooves you to do the same, since your code will break on lots of web content if you do otherwise. Here’s another example: a web crawler like Google’s doesn’t run in a browser, but needs to process public Web content in as browser-compatible a way as possible, so that Google’s search results most accurately reflect what you and I will see when we click through to the pages returned.

How do you make sure that XML technologies can co-exist on the Web alongside HTML without necessarily having HTML’s sloppiness leaking into all Web languages?

When he asks this, Raman seems to believe that the messiness of the Web is somehow contained in and limited to HTML and/or the major browsers, and that the rest of the platform doesn’t have to know about or handle the sorts of messiness that browsers have to. That somehow the messiness is imposed on the rest of the Web stack by the browsers.

But this is just wrong. The messiness is in the content on the Web, it’s simply one of the core facts of the Web. The browsers are just on the front lines of dealing with it. Any tool purporting to be useful on the Web, and any specification purporting to describe the reality of the Web, must recognize this.

Comments

  1. Hi, I'm not sure of the scope of speech here... we have The Internet (the network of all networks), and then we have the World Wide Web as one particular application atop The Internet.

    Some of the specs cited above are Internet specs, while some are specifically WWW specs. Sound true to you...?

    jd/adobe

    John Dowdell, 21 May 2009

  2. John: Sure, of course. Notice how 'The Internet' sits below everything in Henri's diagram.

    Edward O'Connor, 21 May 2009

  3. Yes, "The Internet" is in a diagram, but it's not represented in the speech. (Lots of people conflate the two, rather than seeing that the net enfolds the web.)

    I'm just trying to check the actual meaning behind all the words.... ;-)

    John Dowdell, 21 May 2009

Add a comment

Posting...