Fast CSS: How Browsers Lay Out Web Pages (SXSW)

Hello there! I’m at SXSW (South By Southwest) 2012 this weekend, going to the interactive conference. Anyway, I tend to take copious notes when I go to talks. I thought they might be useful to a wider audience, so here you go. You can see other SXSW 2012 posts that I’ve made as well.
IMAG2390

Introduction

I want to talk about the main pipeline from markup to graphics in web browsers: how they handle HTML and CSS, through this rendering path, to the screen. I will not talk about specific techniques or interaction with the network (resource loading).There’s better information about that out there.
I’ll talk about how you can experiment for yourself to see what’s fast or slow, but I won’t talk about practical application.
I can tell you about Gecko and Webkit, but we know nothing about Internet Explorer and Opera because they are not open source. We expect them to be reasonably similar, but it’s a guess.
The core I’ll talk about is the pipeline from markup to graphics – html, svg, css – from being a file you get off the ntework, to being a tree represented in memory, to be another data structure that represents layout, to be something drawn on the screen.
IMAG2386.jpg

High Level Overview of rendering a static page

HTML and other markup languages are just a serialization of a tree structure. The job of the parser is to do that conversion. Differnet browsers invent different terminology, so we have three names for everything. Sometimes this is called DOM tree, sometimes called content tree.
IMAG2391
Elements, documents, tree nodes. Elements can be html or SVG, and also can be distinguished as what they are.
After we have the DOM tree, we then want to use some other input. CSS as input to construct the sceond tree structure (Rendering tree / Frame Tree in Gecko / the Bos tree in CSS), which is similar to the dom tree, but different in a lot of ways.
This second tree (the renering tree) describes position. In a series of rectangles. Similar to content tree, but might not map one-to-one. Maybe something isn’t rendered because it’s invisible. so on and so forth. The categorization of the nodes in the content tree is based on the markup language. The categorization of things in the rendering tree is based on the CSS rules, or form controls, or other display types (e.g. SVG). I’m lumping selector matching and computing style into the process of building the rendering tree.
What we have to do next is compute the positions based on what is in the rendering tree. The process is called layout or reflow. Once we’ve done that, we can paint them to a 2-D graphics API.

Optimization

A lot of pages today aren’t static. They are dynamic – they change the dom, they change styles, etc. I want to focus on how the browser responds to those dynamic changes. If the page changes the DOM, doing all these things over again in response is completely unoptimized, and won’t work. We can’t keep up with the user typing, for example, if every keypress requires a full run of the process again. We can’t keep up with that. You want the page to respond quickly.
IMAG2392
(The arrows represented when the full process is skipped for various reasons explained below)
I’m going to talk about the types of optimizations the browser makes. I’ll split them in three broad groups:

  • Skip entire steps of process in response to some change – if we know we don’t need to do it, skipping them is the best
  • Skip part of a step – we might layout part of a page, or reconstruct the rendering tree for just a part of the page instead of the whole page. If the DOM is changed in one limited place, the browser won’t throw the whole thing away if the rendering tree has to change. Just the part that needs to be reconstructed.
  • Coalesce changes – in other words, the page can do a whole series of things, each alone would require redoing a lot of work. The browser won’t redo the work until it really needs to do that.

Caveats here: optimizations vary across browsers. Browsers handle content more uniformly across browsers than how they optimize, which varies more widely. Since browsers behave differnetly, a single browser can change how they optimize things. We always try to make things faster, nbut that won’t always be the case. Optimizations can and do change. Sometimes we’ll optimize for a particular set of cases, and that slows other downs.
I’m going to separate out two of the pieces from the previous diagram. Computing style (selector matching) and construct frames (building rendering / frame tree). I’ve separated selector matching in CSS from constructing frames in the frame tree.
If we set a non-presentational attribute, we have to figure out if the presentation has to be changed at all. If there is no way the change wil affect the presentation, we’ll skip that. If there is a selector in the stylesheet that selects on that stylesheet, we’ll have to re-run selector matching. Maybe we haven’t changed the set of selectors that match – we look to see if this single element matches – that is not the same as doing it for the whole document. Maybe this div isn’t in an element with id list, nothign changed, we leave again.
Properties that require us to reconstruct the rendering tree:

  • Display
  • position
  • float (from/to ‘none’)
  • transform (from/to ‘none’
  • column-*
  • counter-*
  • quotes

The last three aren’t used all that often, so we don’t worry about optimizing them that carefully. We use a slow code path there, because there hasn’t been a need to optimize it.
Properties we don’t need to reconstruct nodes in the rendering tree for, but we need to recompute positions and sizes again (reflow / lay out):

  • width
  • height
  • font-*
  • margin-*
  • padding-*
  • border-*-width
  • letter-spacing
  • word-spacing
  • line-height

Properties where we don’t have to re-do layout:

  • color
  • background-*
  • border-*-color
  • z-index

Or maybe nothing changed with the new selector match. Mouse moved onto paragraph, changes yellow, but it was already yellow: we do nothing.
Properties that have custom optimizations:

  • transform – we could handle changes to this by doing re-layout. Authors really care about the performance for this property, so it has its own custom code path. It has its own handling because we want it to be fast.
  • cursor – none of these paths would handle a change in cursor. We have to look and see if the mouse pointer is actually in the element, 5-10 lines of code – custom because the other code paths don’t handle this.
Coalescing

Authors might change the same element twice. For example, if position is changed, then overflow right afterwards, we want to buffer it up and make the change once rather than redo the render tree all over again. Changing an elements parent affects the child – merge them into one repaint.
So when do we make the changes we’ve buffered up?

  • It’s time to redraw. Monitor refresh rates are 60-70 hertz. No point for the browser to redraw faster than the monitor is refreshing. Wasting CPU on something the user will never see. Browser tries to maintain a refresh rate similar to typical monitor refresh rates.
  • If script asks for something that requires processing those changes. The web programming model is based on the appearance that everything happens instantly. We have to act like it happens instantly, so if the author asks for a position or a size or style (via a script), we have to go flush those changes to give them the right answer.

Browsers can differ in how they flush this buffer. We have a set of things that flush the construction of the rendering tree. There is nothing that requires that – it could be that there are things that flush style only or flush layout only. In gecko we have things that flush everything.
Calling getcomputedStyle, etc. if it tells you where an elements is or how big it is, it will flush the layout buffer.
Don’t defeat all the optimizations that we do. It’s very easy to write a loop that changes something, then asks for something for us to flush those buffers. We’ll have to run everything through the loop as a result, and this will be very expensive.
DONT DO THIS
(pseudo code)
loop (i < n; n++) { set style for an object, get offsetheight, and change it } This can make pages orders of magnitudes slower. We hope to have better debugging tools to detect this sort of problem in the future. You can miss this because things like this are hidden in frameworks. There are some JQuery plugins that do this, for example.

Skipping parts of these operations

We can skip a part of the process. We can reconstruct part of the rendering tree for example. The performance characteristics are different for each one. Reconstructing rendering tree is simplest of the 3. Reconstructing a node means reconstructing all of its descendents in Gecko. If you force a recontruct on the body element, you’re reconstructing the entire tree. There is a tiny cost resulting from the depth of the tree it’s in. A node that is 200 nodes deep, has a measurable cost. Otherwise cost is proportional to the number of things you’re constructing.
You might want to measure how expensive this is.
(script to test it / skipping part of the frame construction / rendering tree construction)
It does exactly what I told you not to do, it flushes the buffer constantly; then I time how long it takes.
Skipping part of the process… something in the layout moving aronud / changes, affects the rest of the page. The process of doing layout always starts at the very top of the tree and propagates down to what needs to be changed. In some cases, this can be simple. A bunch of divs, will lay out their contents, maybe they’ll be taller, push down some siblings. A change inside of a table, might cause the widths of the columns to rebalance and cause the entire table to relayout. So the cost of the surroundings affects the cost of relayout. Experiment with surroundings.
Recomputing instrinsic widths is slightly separate; basically there’s two different phases to doing layout. The first doesn’t happen all the time. A lot of CSS concepts that rely on instrinsic width. The larger width is for the text on one line only, the smaller, the width of the smallest unbreakable text. Every element doesn’t have those intrinsic widths, so we build this up. Tables also have intrinsic widths. Recomputing those is separate from redoing layout, because some properties that require recomputing layout require recomputing intrinsic widths, some don’t.
flush layout function in preso – this can measure how expensive a structure is, in terms of layout. You can experiment with the structure content is inside, and the content itself.
Painting is a bit different. We’ve been talking about subtrees and nodes in a tree. Repainting isn’t involved with nodes; we’re repainting rectangles or a set of rectangles. There’s no mechanism to flush painting. There are ways to spot very bad cases: in some browsers, you might see slow repaints if you switch tabs back and forth or uncover a window – that will vary depending on browser and OS graphics handling, and only happens in very bad cases. There are some debugging tools that let you see what’ s getting repainted. In nightly builds of firefox, we have a hidden preference in about:config that lets you have the browser flash everytime it starts repainting so you’ll know when it’s happening. We’re working right now to optimize repainting in Firefox. Soon we’re hoping to have major improvements in what gets repainted.

Other resources:

  • A useful API: window.requestAnimationFrame – you write pages that update things based on a timeout, and you’re guessing what the right timeout is. If you can hook into the browser’s refresh cycle, you can hook in and animate as smoothly as possible.
  • If you want to hear about faster page loading, http://stevesouders.com has great information about making sites load quickly.

Questions:

Does the browser refresh rate reflect canvas refresh rates, 2D or GL?

You can paint to a canvas whenever you want, but browser refresh rate affects when it gets redrawn to screen. At least for 2D. Basically, you can still update the canvas whenever but we’ll only refresh at certain points.

As a developer really trying to optimize experiences: can you provide context for benefits of optimizing for these things vs. optimizing for latency in networking / http requests?

Depends on what you’re trying to do. Building documents, or building applications? In the world of building apps, I think worth optimizing for both. Which perf characteristics are you having the worst problems with? Depending on what you need to optimize, one or the other might be better.

If you can compare a couple of browsers – chrome, safari, IE – do you know what optimizations they are using? Also, desktop vs. mobile – how do the optimization tactics differ?

In terms of comparing browsers, I’m not sure how well-qualified I am to answer that. I know a lot about gecko, a little about webkit, not much about anything else. It’s really hard for me to say.
Mobile vs. desktop – I think a lot of the optimizations are going to be the same. Mobile browsers are built on the same engines desktop brwosers are. Optimizations might be more important on mobile – need to delay refresh rate, but probably not huge differences.

I need to do an engineering preparser of our styles because of a component model that only serves styles associated with a compnent that happens to be on the page… I’m forced to use very hyper-specific selector chains… as many as 6-8 deep in specificity. Can you talk about how selectors are parsed, maybe optimizations – how expensive is that, in terms of drawing to the browser?

You’re working with a component system that requires you to serve stylesheets… underlying question is about the performance of selectors, that are very long with a lot of combinators. I think the fundamental thing about selector matching is that they are matched from right to left. We do it when we’re constructing a node in the rendering tree. We do it as part of the process of finding the selectors that match a specific node. Here is a selector, find the parts that match it. The most important thing about selector perforamnce, it’s going to be faster the sooner it fails. The more time you have to search for something that matches, the more time you have to spend trying to find that selector. There have been some recent optimizations to selector matching. If the rightmost part, after the last space or last child selector is as specific as possible – if that fails, you never have to look for siblings, etc. so it’s faster. That advice is changing a little bit. Webkit landed optimizations that filters out some sets of selectors before that point.

Testing a piece of code for performance issues, useful, thank you. Printf debugging to finding performance issues to using an actual profiler – do you know of any tools, instead of testing a specific code – could the tool do general perforamnce profiling across a poorly performing site?

I don’t know of anything right now. I’m hoping that tools that are good for profiling these aspects of layout will exist in the near future. There may be some I don’t know about right onw, I can’t recommend anything.

Do you have any lists of guidelines or checklists of things to look for?

One of the top guidelines is to try to avoid breaking the coalescing optimizations. I don’t have any good lists off the top of my head. There’s a number of pages that exist about selector matching performance optimization .

Accomplishing the same layout several different ways – are there performance benefits, or does it not matter?

The simple answer is yes, but I don’t think that’s useful to say. I think in general it’s often features perform better when they are used for what they were designed for. It’s common to do layouts using floats, but they weren’t really designed for that. They were designed for pulling things out of the flow rather than for the entire layout of the page. In some cases their perf characteristics aren’t hat great. Sometimes they lead to excessive repaints, we are hoping to fix that soon. I’d hesitate to advise for or against any specific thing based on perf alone, because there are ways – once you’ve chosen one concept, there are ways to improve any problems you might run into. Tables have their own set of performance problems, but part of the point of these layout systems, is that the browser is supposed to do the work for you. If you do all the work yoursel fin a script to avoid taxing the browser, you may end up doing it slower than the browser can do.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.