What to expect from HTML5

Edward O’Connor

History

  • Archaic HTML — TimBL @ CERN
  • HTML < 3.2 — the IETF’s HTML WG
  • HTML 3.2, 4.0, and 4.01 — the W3C’s HTML WG
  • XML — W3C’s XML WG
  • XHTML 1.0, 1.1 — W3C’s HTML WG

Mismatch

  • HTML was defined as an SGML application
    • Virtually all real-world markup is invalid
    • Browsers parse HTML in a way very unlike SGML
  • XHTML is defined as an XML application
    • Virtually all real-world XHTML is served as text/html
    • (not to mention well-formedness errors)
    • Browsers treat XHTML as HTML

Mismatch

  • Existing standards do not reflect reality.
  • Reality is billions of documents.
  • 50, 100, 500 years from now, will we still be able to read our cultural output?

HTML5

HTML5 is a specification of HTML as it appears in the real world.

http://whatwg.org/html5 http://dev.w3.org/cvsweb/html5/spec/

  • With just this spec, you could build your own browser that works with today’s web, today’s content

HTML5 not defined in terms of SGML

Tag Soup

  • Sounds an awful lot like putting a big, fat, W3C stamp of approval on all the crappy tag soup out there…
  • Browser requirements are not author requirements

Browser requirements

  • Browsers have to support legacy content
    • <table> for layout
    • <b>ed and <br>eakfast
    • etc.
  • Browsers have to interoperate
    • what <b>does <i>this</b> look</i> like?
  • Defining interoperable error handling means no more reverse-engineering IE — Yay!

Doctype

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

No SGML means: no public identifier, no DTD

<!DOCTYPE html>

Triggers standards mode in Opera, IE, Firefox, and Safari.

Dropped elements

Authors aren’t allowed to use

<acronym> <basefont> <big> <center> <dir> <font> <frame> <frameset> <isindex> <noframes> <noscript> <s> <strike> <tt> <u>

But browsers are still required to handle them.

New elements

Document structure

  • <article> handy for blog archive pages
  • <section>
  • <header>
  • <footer>
  • <style scoped>

Many of the new elements came from studying real-world use of @class — how are authors compensating today for missing elements?

Document structure (con’t)

  • <aside>
  • <dialog>
  • <figure>
  • <nav>

2D Graphics API

Media

  • <embed> — browsers would have to support this anyway
  • <audio> and <video>

Web Forms 2.0

  • <input type={range,email,url,time,date,etc.}>
  • <output>
  • <input autofocus>
  • <input autocomplete>
  • etc.

You can help!

The W3C and the WHATWG are working on the same spec, and both working groups are open to the public!