Interview with Ian Hickson, HTML editor
Hot on the heels of our interview with Robin Berjon, editor of HTML5 at W3C, over the weekend I interviewed Ian Hickson, editor of HTML “The Living Standard” at WHATWG and, arguably, the most influential individual working on the Web today. Hickson, known as “Hixie”, works for Google and previously worked for Opera (my employer) and Netscape.
Some of questions were suggested by web developers over Twitter, where indicated.
If you could wipe out any web technologies from existence, which would you choose? (from @webr3)
The Web’s technology stack is one of the only (maybe the only? I’m hard pressed to come up with another example) platform that is completely vendor-neutral and not centrally developed. Anyone can invent a new feature and if the market agrees, can get that feature to be a de facto part of the platform.
XMLHttpRequest is a classic example. A browser cannot be a serious competitor on the Web without supporting it now, yet it was initially just made up and shipped by one vendor without any consultation with anyone. Even the features that are designed in public with wide review end up implemented and used before they’re “done”, and once they’re used they can’t be changed — which leads to APIs that make no sense, like
pushState(), which has a required argument that is ignored.
The hardest part of this is that we can’t tell if something we’ve designed is any good without testing it in real world sites, and by the time it’s been properly tested, it’s too late to change it.
pushState() happened like this (where we learnt that the second of the three arguments should be dropped only after there were already pages depending on using the first and third arguments), but there are even bigger examples: the CSS cascade and specificity mechanism, for example, which was a great idea but that doesn’t quite work at the end of the day, or even XML and specifically XML namespaces, which are widely recognised as a disaster but which whole layers of the Web platform depend on.
In the end, the most horrible parts of the Web are also the most successful, because it’s the most successful parts that get extended by the most people, and so that have the least coordination in terms of their long-term evolution, which is how things end up being horrible.
HTML5 vs. the “living standard”: how do I know which elements will be around ten years from now, so I can safely use them in client work? (from @szafranek)
You know what will be around ten years from now by looking at what is implemented in two browsers today. If it’s implemented in two browsers today, I can almost guarantee it’ll still be around in ten years. If it’s not, all bets are off.
This has nothing to do with “living standard” vs. versioned specs, though. HTML4 has all kinds of features that aren’t in HTML anymore — for example,
<object declare> and
Do you feel there’s enough innovation within the user agent market, since most are virtually equivalent in every way? (From @webr3)
I don’t really agree with the premise of the question. Browsers differ in many important ways. One of the reasons they might seem to be “virtually equivalent”, though, is that they often adopt features from each other. Another is that one of the key features they’ve all been trying for is “get out of the way and let the Web shine through”, which means you see less and less of the browser itself.
In the area of browser development that I’m most familiar with — namely the Web technology stack — there’s a lot of innovation going on. I get a lot of e-mails from browser vendors asking me to spec their new invention so that it has a spec and can be implemented by everyone else also. In fact, it’s gotten to the point where I have to track these requests to make sure I only spec those with multiple vendors on board. (I don’t want to end up with a spec where each section is only implemented by one browser. That would be a waste of everyone’s time.)
The browser space right now has more active competition than it ever has had in the past, especially on desktops. It’s a great time for users.
Why are you the one who always says “no”? Casing the
<main> elements here (from @helloanselm). Why so grumpy? (from @steve_fenton)
The short answer is, it’s my job!
There’s really two questions here. First, why do we in general say “no” so often? And second, why is it that it’s so often me that says it?
The first one is a combination of prioritisation and design.
First, let’s look at prioritisation. There are very limited resources available to add features to the Web. Every feature has a cost:
- Implementation: someone has to write code for it in each browser
- Testing: someone has to write the tests to check the feature is working
- QA: someone has to regularly run the tests to make sure the feature doesn’t regress
- Code maintenance: when browser vendors refactor code, they have to refactor more code if there are more features
- Tutorials: people who write tutorials have to include the feature, or handle feedback asking for them to do so
- Cognitive load: authors learning the platform have more documentation to wade through, even if they don’t care about the feature
- Page maintenance: authors have to know how to maintain the feature if other people have used it in pages they now maintain
- Spec writing: someone has to write the spec for the feature and ensure it’s maintained
- Bug fixing: when bugs are found in the spec or implementations, they have to be maintained
- Code size: each feature increases the size of browsers (both on-disk binaries and in-memory resident size)
(This list comes from our FAQ.)
We get an absolute torrent of feature requests, from authors, users, browser vendors, etc. So we have to prioritise:
- Is the feature going to be implemented by more than one browser? If no browser has signed up, or if only one has signed up and no other browsers have shown any interest, then I add it to the page of notes I mentioned earlier, and move on.
- Is the feature a game changer, or is it merely incremental? If the feature does no more than save a line of code every now and then, then its value probably doesn’t warrant its cost.
- Is the feature compatible with the Web philosophy? If the request is for a way to force the user to read something the way the author wants, it’s likely to be a non-starter. If it’s something that enables the user to control the page better, that’s more likely to be a winner.
So that’s the first part of the answer: we say “no” because we don’t have time to add everything.
<main> is a great example of this. In my opinion, the value it adds is so minimal that it just isn’t worth the cost.
Next, let’s look at design. The way we usually approach ideas is that we describe the problem, then we come up with solutions to the problem and compare them by evaluating them against the problem. This is where most concrete proposals run into a “no”, because something else got the “yes” instead.
<picture> is an example of this. We described the problem (that took a few months of back and forth), and then once we had a problem description, I looked at the various proposals and synthesised a solution based on those that addressed the problems adequately while trying to avoid some common design pitfalls based on lessons we’d learnt from previous ideas. So for example with
<picture>, we learnt with
<source> that having multiple elements for selecting a resource is a huge design pitfall, so when designing a solution to the problem here, I avoided that (hence the
srcset="" design based on CSS’s
So that’s the second part of the answer. We say “no” to some specific proposals because we’ve come up with other (better) solutions instead.
The second implicit question here was why do _I_ say “no” so much, as opposed to other people. This is basically politics. It’s hard to say “no”, in part because of human nature, but also in part because when you say “no” in this kind of context you tend to invite argument. Most browser vendors don’t have time to have these arguments. I do. It’s my job, and I’ve gotten pretty good at saying “no” over the years. So a lot of the veteran browser engineers will just redirect people to me and expect me to say “no” for them. This lets them get on with their job of writing a browser, redirecting all the ire and arguments to me, whose job it is to field them. Or they’ll see an e-mail to the WHATWG list, think “oh my, that’s crazy”, but just stay silent because “Hixie will deal with it”.
In reality, I can’t really say either “yes” or “no”. What I say doesn’t mean anything: I don’t write any browsers. The browser vendors say “no” to me all the time by not implementing what I’ve specced. Just last week I revamped part of the spec (on context menus) because none of the browser vendors implemented it. One of the specs I spent years of my life on — XBL2 — has gone precisely nowhere because the browser vendors just don’t want to implement it. (My main mistake with XBL was trying to solve the whole problem at once, rather than doing it in incremental steps that implementors could more easily deal with.)
Similarly, there’s stuff that was never specced, indeed stuff that I’ve said “no” to myself, which the browser vendors nonetheless implemented. Again with the context menu example: Gecko implemented it by having an element called
<menuitem>, which I argued forcefully was a bad design. They disagreed with my arguments and implemented it anyway, and so (eventually) I updated the spec to have
<menuitem>. (This isn’t a criticism of Mozilla. It’s how these things are done. They did nothing wrong here.)
Do you feel there’s enough feedback from HTML authors, rather than vendors, getting through to specification editors/authors? (from @webr3)
I don’t know about “enough”. Is there ever “enough” feedback?
We do get a lot of feedback from Web developers, but it’s true that most of it has to be collected by seeking it out on forums, sites like Reddit, Twitter, Google+, people’s blogs, etc. I have some Google Alerts that lead me to a lot of places where people are complaining about HTML in one way or another, which I use as feedback. It’s almost certainly not enough. I wish we had more authors participating directly by either filing bugs or participating in the WHATWG list. People probably don’t realise how much feedback we get from developers, because they don’t see me reading the blogs and so on. Also, a lot of the feedback browser vendors send us is actually of the form “Web developers tell us that…” (though they often don’t say that explicitly).
The problem, to be honest, is that most authors don’t know what they want. Many years ago, Google actually paid for a usability study for one of the features I was speccing (microdata) and it was absolutely fascinating to see how people’s actual performance was so divergent from their impressions. We’d ask them questions like, “Which would be simpler, this or that?”, and then we’d actually test them by having them use the options, and there was just no relationship between what people said they wanted and what people were actually able to use.
Often when people send feedback (not just authors; pretty much anyone who hasn’t been in the process for a long time starts this way), they send feedback along the lines of “I want to add feature X” or “I want feature X to be extended in manner Y”. But when we drill down and ask them, “What problem are you trying to solve?”, or, “What’s your use case?” (same question but phrased differently), we often find that either (a) they actually don’t have a real problem and just thought that it would be a good idea, or (b) their solution wouldn’t actually solve their problem. Often we’re able to come up with much simpler solutions (or point to already existing solutions), which is quite satisfying.
Two and a half years ago, I asked you, “Would you like to be the HTML 6 editor?”. You replied, “I might want a change of pace when we’re done with HTML 5.” How come you decided not to take that holiday and continue editing HTML?
We’re not done yet. I still have 238 e-mails and 163 bugs remaining.
Do you have an exit strategy, or would you like to edit HTML for the forseeable future?
Well, I could stop any time, really. So long as I find it interesting and implementors respect my work, though, why stop? We have a severe lack of spec writers who are able to write good specs.
For styling form components, we were told to wait for XBL, then to wait for Web Components. Anne van Kesteren said, “After a decade of waiting for this, I think it might be time to start calling this vaporware.” Thoughts?
If anyone has a better proposal, I’m all ears.
You work for Google, which has a widely deployed browser, a mobile operating system, and massive web properties. Do your employers require you to align specifications to their business objectives or otherwise influence your specification decisions?
No, quite the opposite. When I started, I was given very explicit instructions to the effect that I should put the Web’s long-term interests ahead of any of Google’s short-term interests.
Having said that, of course, one of the reasons I wanted to work for Google is the unique perspective one can get from working here and from having access to the data Google has. There’s no question that that has influenced my decisions.
Do you have other duties apart from specification?
I basically work on the HTML spec full-time, but yes, I have some other duties internally.
Is the Web Platform becoming too complex (Web Components, shadow DOM, etc.)?
Too complex for what?
For the amateur with something to say, rather than professional coder? Or new entrants — e.g., Geri Coady’s Pastry Box article?
clear set, or the inline box model.
This isn’t surprising, though. If you define the “Windows” platform as everything from the NT kernel API to the Direct3D API and everything in between, I’m pretty sure you’d quickly come to the conclusion that nobody understood that whole platform either.
I don’t think this is a problem.
The WHATWG began because the W3C told you, “HTML was dead. If you want to do something like HTML5, you should go elsewhere.” Now that the W3C has come to its senses, is it time for the WHATWG to hang up its spurs and for its participants to work inside W3C to continue the development of the web platform?
We tried (2007–2012). It didn’t work out. In fact, we ended up spinning more specs out of the W3C! The WHATWG has about 12 specs spread amongst eight or so editors now.
In what sense didn’t it work out, and why?
We have different priorities, different visions, different approaches. I’m probably not the right person to give an objective answer, though.
You seem to have most of your spats with people who work in the field of accessibility. Why is that?
I don’t know. I wish I did.
You told me, “People with disabilities are just as important to me in my work on HTML5 as is anyone else”, but I guess they would say you don’t pay enough attention to the needs of disabled people (
summary, re-writing rules for alt text). Presumably therefore you feel that disabled people are better served by removing these features. What do you base that assumption on?
The premise of the question is that there’s one accessibility community that speaks with one voice and has one opinion, and that they entirely represent the exact homogeneous desires of all disabled people. But none of that is true. There are multiple Web standards accessibility communities. People within those communities often don’t agree with each other. There are lots of disabled people with lots of different opinions and different needs. The accessibility communities don’t always represent all the disabled people’s needs. And so forth.
Taking the specific features you list:
longdesc=""is a solution in search of a problem, which virtually no author uses, and which those who do use almost entirely use incorrectly, and which therefore users have long ignored. Discouraging its continued use means authors are more likely to spend time on things that are more likely to improve accessibility. Mark Pilgrim, author of “Dive Into Accessibility”, wrote a post about this.
- Table summaries: check out the HTML spec for
<table>. The first paragraph says that
<table>is for representing tabular data (incidentally discouraging the use of
<table>for layout tables, which is good for accessibility). The second paragraph says what tabular data is (a grid). The third paragraph provides a link to the section that defines what the table model is. The fourth paragraph encourages authors to “provide information describing how to interpret complex tables” and links to an entire section describing how to give table summaries. That section has six different suggestions for how to do it.
- Rules for alternative text for images: the section on
<img>first says that
<img>represents an image, then briefly says that how you give the URL for the image (it goes into more detail for that again later), and then the third sentence points at the section on
alt=""— which consists of literally a dozen different subsections on how best to give alternative text for images in a wide variety of cases. When this was first written, it was probably by far the most detailed description for how to write alternative text ever published.
So we didn’t remove two of those three features. We expanded them dramatically. The one feature we removed was really a misfeature, and we have the data to back that up.
What’s been your biggest mistake? What are you most proud of?
I’m probably most proud of the HTML parser specification. Years ago it seemed like such a preposterously absurd task to take on that I said I would never do it. And it was huge (and we’re not quite done yet; there’s at least one major outstanding bug). But this, more than anything, has been the crowning achievement of the HTML spec. We went from a world where four browser engines had four completely different approaches with radically different behaviours in every edge case, to a world where the browsers agree so closely on their behaviour that we have an almost unheard of level of interoperability. There have been academic papers written about it. It’s completely changed how we write parser specifications, and I expect the knock-on effects on other specs will continue for years. It used to be that people said that specs should only define how to parse conforming content and should do so in a declarative fashion. But now saying that sounds hopelessly naive and old-fashioned.
My biggest mistake…there are so many to choose from!
pushState() is my favourite mistake, for the sheer silliness of ending up with an API that has a useless argument and being forced to keep it because the feature was so desired that people used it on major sites before we were ready to call it done, preventing us from changing lest we break it.
postMessage()‘s security model is so poorly designed that it’s had academic papers written about how dumb I was, so that’s a pretty big mistake. (It’s possible to use
postMessage() safely. It’s just that the easiest thing to do is not the safe way, so people get it wrong all the time.) The appcache API is another big mistake. It’s the best example of not understanding the problem before designing a solution, and I’m still trying to fix that mess.
At the less-technical and more-political level, I think taking WebSockets to the IETF was a huge mistake. It ended up delaying the spec for literally a year without any resulting improvements, and they made a number of changes that IMHO reduce the protocol’s security. (Again, it’s possible to use it safely, it’s just not as easy as it was before the IETF got involved. Ironically, they broke the very things we learnt to do right after getting them wrong with
postMessage().) We’re still waiting for things like compression and multiplexing, which we would probably have had a long time ago if we hadn’t gone to the IETF.
Also, when I designed the Web Storage API I made a horrible mistake by making the API synchronous and requiring consistency during a script’s execution, which essentially became the one place in the whole Web platform where, in theory, browsers are required to block cross-process. Since browsers don’t implement that, we instead have an API where there’s no consistency guarantee, which is rather scary.
Outside of the HTML spec, one of my biggest mistakes was not realising, back when I was first in the CSS working group, that specifications were not immutable. CSS2 had these vague rules about margin collapsing and about the inline box model which David Baron and I took as essentially immutable constraints around which we were to strictly define the margin collapsing model and the inline box model. We succeeded, in that now the spec has mostly well-defined models for both those things, but boy are they insanely complicated. What we should have done instead is just break the constraints and come up with something simpler, ideally something that more closely matched what browsers implemented at the time.
One of the side-effects of that kind of thinking (not directly my mistake, though I didn’t argue against it as far as I recall) is that we ended up with “Quirks Mode” and DOCTYPE switching. What we should have done is just made the specs match browsers and not bothered with all these modes.
I also made mistakes with the Acid tests, wherein I tested things that the specs required but which in retrospect it would have been better to ignore, which ended up encouraging browsers to implement things that we later realised it would have been better all around if we had skipped. Some of those we were able to mitigate, others not so much. Acid3 had some stuff around SVG fonts that we should have never had. Acid2 had some crazy stuff around SGML comments that we were thankfully later able to totally abandon.
I made a number of mistakes in the development of XBL2 (as mentioned earlier), but since the result of those mistakes was that it’s getting ignored, the only real long-term cost has been the opportunity cost and delay in getting a solution in that space.
I’m sure I’m forgetting some huge mistake I’ve made that everyone is yelling at me for. (I asked for suggestions for this question on IRC, and after we were done enumerating a bunch of mistakes, Anne asked, “That’s a long list. Why do we trust this guy again?” D’oh.)
Urban geek legend has it that you invented Web Sockets because you wanted to control your model trains through the browser and couldn’t. True or false?
Obviously there’s a lot of demand for something like Web Sockets, so it’s not like it was only because I want to control my trains that we worked on that, but it was definitely a key motivating factor for me. Without Web Sockets I was forced to use hanging GETs and all the other techniques people will be familiar with, which introduces precious milliseconds of latency that can be so destructive when you have two locomotives speeding towards each other. I’ve since used Web Sockets in pretty much every non-trivial development project I’ve done.
You specified microdata, which even you dislike, because RDFa was ugly and hard to write…
It’s not so much that I dislike microdata so much as I don’t think the problem microdata (and RDF, etc.) solves is an interesting problem that should be solved. Of course, enough people think it is a problem that should be solved that I approached it like all the other problems. It’s a good example of how the HTML spec is not just a reflection of my wishes.
…The WHATWG shot across RDFa’s bow worked. RDFa Lite now does everything microdata does, while also being compatible with Facebook’s Open Graph Protocol…
RDFa Lite doesn’t do everything microdata does. There’s a number of things that microdata does (or doesn’t do) that are absolutely key:
- not have anything to do with RDF
- not have any support for any prefixing mechanism
- have integration with the drag-and-drop API
There are probably others, too, but I haven’t studied RDFa Lite carefully.
Why is “not have anything to do with RDF” key?
It’s where microdata and microformats get most of their simplicity compared to RDF-based technologies.
…To avoid fragmentation, are you going to withdraw microdata as a spec, add an RDFa API, and use RDFa Lite instead? If not, why not?
Well, at this point microdata is in wide use, so there’s no way to remove it even if we wanted to. But since RDFa still doesn’t solve the same problems, it hasn’t come up.
RDFa (or rather, RDF) and microdata have fundamentally different data models. RDF is a triple (quad, really) database. Microdata represents tree structures. Microdata has more in common with JSON than RDF.
What are your thoughts on angle brackets…
They’re pointy? I don’t know. Are angle brackets something people have thoughts on?
…and how does HTML stand up to JSON? (from @pal_nes)
Different solutions to different problems.
JSON’s missing one major feature that I wish it would get, which is defined error handling (even if it’s only XML-style “you must abandon the parse if it there is an error”). Right now, handling of broken JSON seems to vary from parser to parser.
Also, often when people define generic syntaxes, they go a bit crazy. When you’re defining a dedicated format (or a vocabulary for a generic format, same thing really), you usually don’t have time to go crazy and define wild features, because your motivation is to solve your original problem and designing the language is a necessary evil, not the end goal.
Is it a problem if people have no markup in the
<body> and generate the DOM with script?
It’s obviously a problem for users who have scripting disabled, but other than that, it’s just an authoring choice.
There are significant advantages to having a static description of markup. You can validate it, for example, to catch semantic errors (such as accidentally putting an
<input> in a
<select> instead of using
<option>). HTML these days has several features to make it easier to do this while writing a Web app — e.g.,
hidden="", DOM cloning, etc.
<ol> for the list is hard-coded in the HTML, but the
<li>s are generated dynamically.
Talking of script, what do you think of DART?
I wrote about the difficulty of replacing the whole Web last year. Replacing any part of the Web (as opposed to extending it) is similar in difficulty, though on a proportionally smaller scale.
How do you think the long-term development of devices and interaction models will impact the development of web standards? (from @wcagtest)
I don’t think it’ll affect the development at all. There’s no difference between new interaction models and any other new feature. The Web and Web standards themselves will obviously be impacted, but I have no idea how. I’m fascinated to find out how the Web will evolve in the face of projects like Google’s Glass or devices like the Pebble.
I am sad that the Web didn’t handle small screens — and later, touch UI — that well. I would’ve hoped that the Web’s media-independent nature would have made that work better. For small screens, honestly, I mostly blame Web designers for assuming big screens and not thinking flexibly. We’re seeing that change now. For touch, though, I wish we had done a better job of mapping the UI to the Web’s generic events. A touch gets mapped to a “click” event easily enough, but drag-and-drop never got mapped, pinch gestures didn’t get mapped to wheel events, etc. Mainly I think this is because the first truly successful touch browser set the standard, and it was developed mostly in secret with a small team many of whom, as I understand it, weren’t Web veterans.
Will native apps triumph over the Web on mobile devices?
Native platforms and the Web have very different characteristics. The Web is by design radically vendor-neutral, and (to a lesser extent in practice) device-neutral. This has huge benefits: nobody can single-handedly kill the Web, for example. If you write a Web page or application today, and then tomorrow your desktop operating system vendor or your mobile phone handset vendor goes bankrupt, you can just buy another device and your page still works. If you target a proprietary platform — e.g., Amiga or OS/2 — that then loses the support of its vendor, the result is that your application is no longer usable.
The cost of having a system immune from the whims of a single vendor is that by and large, innovation doesn’t happen in multi-vendor discussions. If you have a proprietary platform, it’s easy to add features to it: you just do it. No need to argue with anyone. On the Web, a feature can only be added if every major implementor agrees it’s worth adding, and that usually only happens once it’s been proven in a native platform. So native platforms have the edge when things are rapidly innovating.
This is why the mobile world today has so much focus on native apps. Every new generation brings radical new features, and the Web will always be behind on those. So the cutting edge is native.
You can see this on the desktop. Innovation on desktop operating systems has slowed down dramatically, and as a result the Web has been able to mature there. The result is that on desktop, Web apps are doing great (so great that it’s viable to create a desktop OS that does nothing but bring up a Web browser, in fact). Mobile is where desktop was a decade or two ago, in terms of innovation.
What’s the biggest danger to a free, open Web at the moment?
What’s the very next thing on your to-do list?
The top thing on my bugs list is “Spec for document.open() doesn’t match reality when the parser is script-created”. That sounds like a hard one. Let’s see…top thing on my e-mail pile right now is…some even harder stuff do with the HTML parser and feedback from [Boris Zbarsky] about some very subtle aspects of the security implications of certain DOM APIs. Yikes. Maybe I’ll do lunch instead…