Can You Afford It?: Real-world Web Performance Budgets
TL;DR: performance budgets are an essential but under-appreciated part of product success and team health. Most partners we work with are not aware of the real-world operating environment and make inappropriate technology choices as a result. We set a budget in time of <= 5 seconds first-load Time-to-Interactive and <= 2s for subsequent loads. We constrain ourselves to a real-world baseline device + network configuration to measure progress. The default global baseline is a ~$200 Android device on a 400Kbps link with a 400ms round-trip-time (“RTT”). This translates into a budget of ~130-170KB of critical-path resources, depending on composition — the more JS you include, the smaller the bundle must be.
We need a new term for the business-opportunity wastage that modern front-end development has created.
Maybe "ambush by JS"?
— Alex Russell (@slightlylate) October 4, 2017
Business leaders who green-light the development of Progressive Web Apps frequently cite the ability to reach new users with near-zero friction as a primary motivator. At the same time, teams are reaching for tools which make achieving this goal impossible. Nobody is trying to do a poor job, and yet the results of a “completed” PWA project often require weeks or months of painstaking rework to deliver minimally acceptable performance.
This rework delays launch which, in turn, delays gathering data about the viability of a PWA strategy. Teams we aren’t able to work with directly sometimes do not catch these problems until it’s too late, launching experiences which are simply unusable for all but the wealthiest.
Setting A Baseline
Teams that avoid unpleasant surprises tend to share a few traits:
- Executive sponsors are enthusiastic. They use “do what it takes” language to describe the efforts to get and stay fast
- Performance budgets are set early in the life of the project
- Budgets are scaled to a benchmark network & device
- Tools and CI systems help them monitor progress & prevent regressions
These properties build on each other: it’s difficult to get the space you need to plan to do things well without decision makers who value user experience and long-term business value. Teams with this support are free to set performance budgets, do “bakeoffs” between competing approaches, and invest in performance infrastructure. They’re also more able to go against the “industry standard” grain when popular tools prove to be inappropriate.
Performance budgets keep everyone on the same. They help to create a culture of shared enthusiasm for improving the lived user experience. Teams with budgets also find it easier to track and graph progress. This helps support executive sponsors who then have meaningful metrics to point to in justifying the investments being made.
Budgets set an objective frame for determining which changes to the codebase represent progress and which are regressions from the user perspective. Without them it’s impossible to avoid slipping into the trap of pretending you can afford more than you can. Very rarely have we seen a team succeed that doesn’t set budgets, gather RUM metrics, and carry representative customer devices.
Partner meetings are illuminating. We get a strong sense for how bad site performance is going to be based on the percentage of engineering leads, PMs, and decision makers carrying high-end phones which they primarily use in urban areas.
Doing better by users involves 2 phases:
- Challenging assumptions & growing understanding of real-world conditions
- Automating testing against an objective baseline
Never before have front-end teams enjoyed access to such good performance tools and diagnostic techniques, yet poor results are the norm. What’s going on here?
JS Is Your Most Expensive Asset
We’re often asked “what’s the big deal about 200KB of JS, some of our images are that size?” A good question! Answering it requires an understanding of how browsers process resources (which differs by type) and the concept of the critical path. For a timely introduction, I recommended Kevin Schaaf’s recent talk.
Consider a page like:
<!DOCTYPE html> <html> <head> <link rel="stylesheet" href="/styles.css"> <script src="/app.js" async></script> </head> <body> <my-app> <picture slot="hero-image"> <source srcset="firstname.lastname@example.org, email@example.com 2x" media="(min-width: 990px)"> <source srcset="firstname.lastname@example.org, email@example.com 2x" media="(min-width: 750px)"> </picture> </my-app> </body> </html>
The browser encounters this document in response for a GET request to https://example.com/. The server sends it as a stream of bytes and when the browser encounters each of the sub-resources referenced in the document, it requests them.
Here are some operations that can happen on other threads, allowing the browser to stay responsive:
- Parsing HTML
- Parsing CSS
- Some JS garbage collection tasks
- Parsing and rasterizing images
- GPU-accelerated CSS transformations and animations
- Main-document scrolling (assuming no active touch listeners)
These operations, however, must happen on the main thread:
- Construction of DOM
- Processing input (including scrolling w/ active touch listeners)
Script execution delays interactivity in a few ways:
- If the script executes for more than 50ms, time-to-interactive is delayed by the entire amount of time it takes to download, compile, and execute the JS
- Any DOM or UI created in JS is not available for use until the script runs
Images, on the other hand, do not block the main thread, do not block interaction when parsed or rasterized, and do not prevent other parts of the UI from getting or staying interactive. Therefore, while a 150KB image won’t appreciably increase TTI, 150KB of JS will delay interactivity by the time required to:
- Request the code, including DNS, TCP, HTTP, and decompression overhead
- Parse and compile the top-level functions of the JS
- Execute the script
These steps are largely serialized.
Deciding what benchmark to use for a performance budget is crucial. Some teams and businesses know their audience intimately and can make informed estimates about the devices and networks current and prospective users are on. Most, however, do not have such a baseline easily to-hand. Where to start?
Two numbers set the stage:
The median user is on a slow network. Just how slow is a matter of some debate.
Our metrics at Google show a conflicted picture (which I’m working to get to clarity on). Some systems show median RTTs near ~100ms for 3G users. Others show the median user unable to transmit and receive an individual packet in less than 400ms in some major markets.
I suggest we should be conservative. Contended, over-subscribed cells can make “fast” networks brutally slow, transport variance can make TCP much less efficient, and the bursty nature of web traffic works against us.
Googlers enjoy access to a simulated “degraded 3G” network to help validate the behaviour of their apps under these conditions. It simulates a link with a 400ms RTT and 400-600Kbps of throughput (plus latency variability and simulated packet loss). Given the conflicted data we see across our other systems, this seems about right as a baseline.
Simulated packet loss and variable latency, however, can make benchmarking extremely difficult and slow. The effect of a lost packet during DNS lookup can be a difference of seconds, making it frustrating to compare before/after for changes at development time. Our baseline, then, should probably trade lower throughput/higher-latency for packet loss. What we lose in real-world fidelity, we gain in repeatability and the ability to compare across changes and across products. There’s much, much more to say about the effects of DNS, TLS, network topology, and other factors. For those who want to go deeper on this, I highly recommend Ilya Grigorik’s “High Performance Browser Networking”. The coverage of RRC alone makes it worth your time.
Back to our baseline, we now have a sense for what our simulated network conditions should be: 400ms RTT, 400Kbps bandwidth. What about the device itself?
At last year’s Chrome Dev Summit I discussed some of the thermal and power-limiting factors that create a huge disparity between desktop and mobile device performance. Add onto that the yawning chasm between low-end and high-end device performance thanks to chip design factors like cache sizes, and it can be difficult to know where to set a device baseline. Thankfully, this is somewhat easier than network speeds: more than half of American mobile users are on Android devices. As you look abroad, worldwide smartphone shipments are (and for the past 5 years have been) overwhelmingly Android-based. The average selling price for those devices is falling in most geographies, driven by the ubiquity of Android and relentless price drops within that ecosystem. This, in turn, drives the single most important trend in setting the global web performance budget hardware baseline: the next billion users will largely come online when they can afford to. This will drive declines in smartphone average-selling-price (“ASP”) in emerging markets for the foreseeable future. This, in turn, means that all improvements to transistor-count-per-dollar will translate into lower selling prices, not faster devices (on average).
The true median device from 2016 sold at about ~$200 unlocked. This year’s median device is even cheaper, but their performance is roughly equivalent. Expect continued performance stasis at the median for the next few years. This is part of the reason I suggested the Moto G4 last year and recommend it or the Moto G5 Plus this year.
Putting it all together, our global baseline for performance benchmarking is a:
- ~$200 (new, unlocked) Android phone
- On a slow 3G network, emulated at:
- 400ms RTT
- 400Kbps transfer
For most technologists, building applications for this environment might as well be farming on Mars. Luckily, this configuration is available on webpagetest.org/easy, meaning we can re-create these conditions here on earth, any time we like.
The Affordability Calculation
The last thing we need for our perf budget is time. How long is too long?
I like Monica’s definition:
The Monica Perf Test™: if you wouldn't make eye contact with a stranger for the time it takes your web app to first paint, it's too slow.✌️💫
— Monica Dinculescu (@notwaldorf) September 20, 2016
…but that’s more qualitative than quantitative. Numerically, we’d prefer every page load occur in under a second (see RAIL). That’s not possible on real-world networks, so we’ve set the following Time-to-Interactive (TTI) metric goal with partners:
- TTI under 5 seconds for first load
- TTI under 2 seconds for subsequent loads
We now have everything we need to create a ballpark perf budget for a product in 2017.
Working backwards from time, network conditions, and the primary stages of the critical path, we get a few interesting results. We can start with our first-load budget of 5 seconds and begin to calculate how much transfer we can afford.
First we subtract 1.6 seconds from our budgets for DNS lookup and TLS handshaking, leaving us 3.4s to work with.
Then, we calculate how much data we can send over this link in 3.4 seconds: 400 Kbps = 50KB/s. 50KB/s * 3.4 = 170KB.
NOTE: This discussion is sure to infuriate competent network engineers. Previous versions of this article discussed slow-start, bdp, tcp window scaling, and the like. They were commensurately difficult to follow. Simplifying has relatively little impact on the overall story, so those details are elided.
Modern web applications are largely composed of JS, meaning we also need to subtract the amount of time the JS needs to parse and evaluate. The gzip compression factor for JS code is between 5x and 7x. 170KB of JS then becomes ~850KB-1MB of JS which, based on earlier estimates, may take a second to run (presuming it doesn’t do any expensive DOM work, which of course it will). Playing with these numbers a little bit, we can get back below 3.4s of download and eval by limiting ourselves to 130KB of JS transferred on the wire.
One last wrench in the works: if any of our critical-path resources come from a different origin (e.g., a CDN), we need to subtract connection setup time for that origin (~1.6s) from the budget, further limiting how much of our 5s we actually get to can spend on network transfer and client-side work.
Putting it all together, under ideal conditions, our rough budget for critical-path resources (CSS, JS, HTML, and data) at:
- 170KB for sites without much JS
- 130KB for sites built with JS frameworks
This gives us the ability to consider the single most pressing question in front-end development today: “can you afford it?”
For example, if your JS framework takes ~40KB of transfer on a JS-heavy site (which gets a budget of 130KB thanks to JS eval time), you’re left with only 90KB of “headroom”. Your entire app must fit into that space. A 100KB framework loaded from a CDN is already 20KB over budget.
Think back: your framework of choice might be 40K, but what about that data system? The router you added? Suddenly 130KB doesn’t seem like a lot when you also need to include data, templates, and styles.
Living on a budget means constantly asking yourself “can I really afford this?”
In an ideal world, all page loads happen in under a second, but for many reasons that’s often not feasible. Therefore we’re going to give ourselves a bit of a breather and budget 2 seconds for second (third, fourth, etc.) load.
Why not 5? Because we shouldn’t need to ever go to the network to get our app’s UI booted once we’ve visited it the first time. Service Workers and “offline first” architectures enables us to put interactive pixels on screen without ever touching the network. This is the key to achieving reliable performance.
Two seconds is forever in modern CPU terms, but we still need to spend it wisely. Factors we need to account for include:
- Process creation time (Android is relatively slow vs. other OSes)
- Time required to read bytes from disk (it’s not zero, even on flash-based storage!)
- Time to execute and run our code
Every app I’ve seen that hits a 5s initial load and implements offline-first correctly stays under this 2s budget, and sub 1s is possible! But getting to offline-first is a huge challenge for many teams. Architecting to save last-seen user data locally, cache app resources in a reliable and coherent way, and juggle application code upgrades using the Service Worker lifecycle can be a major undertaking.
I’m looking forward to tools continuing to evolve in this area. The most comprehensive bootstrap I know of today is the Polymer App Toolbox, so if you’re not sure where to start, start there.
130-170KB…Surely You’re Kidding!?!
Many teams we talk to wonder if it’s even possible to deliver something useful in as little as 130KB. It is! the PRPL pattern shows the way through aggressive code-splitting based on route awareness, Service Worker caching of granular (subsequent-page) resources, and clever use of modern protocol enhancements like HTTP/2 Push.
Taken together, these tools enable us to deliver functional, modern experiences in under 100KB for the critical path.
Sadly, it’s still sort of difficult to tell from a specific trace which parts of the page load are critical-path resources for TTI and which aren’t, but I’m optimistic that tools will evolve quickly to help us understand this key metric.
Regardless, we know it’s possible, even without giving up on frameworks entirely. Both Wego and Ele.me are built with modern tools (Polymer and Vue, respectively) and help users complete real transactions today. Most apps are less complex than they are. Life on a budget isn’t starvation.
Tools for Teams on a Budget
Getting under-budget _is_ hard, but the benefits to the business and to users are immense. Less often discussed are the benefits to engineering teams and their leaders. No tech-lead or PM wants to be on the wrong side of an executive who walks into their area with a phone asking “so why is this so slow when I’m on vacation?”
This isn’t theoretical.
I’ve seen teams that have just finished re-building on a modern tech stack cringe for an hour as we walk them through the experience of using their “better”, “faster” experiences under real-world conditions.
Everyone loses face when the product fails to meet expectations. Months of unplanned performance fire-fighting delay the addition of new features and have a draining effect on team morale. When performance becomes a crisis, mid-level managers get caught between being the “shit umbrella” their teams count on and crushing self doubt. Worse, they may begin to doubt their team. The other side of a performance crisis is a long road; how can the organisation trust the team to deliver a quality product? Can they trust the TLs to recommend new technology or large re-investments? Recriminations follow. This is a terrible experience, specifically for developers who are too often on the receiving end of incredible pressure to “fix it”, ASAP — and “it” may be a core technology the product is built on.
In the worst cases, the product may be unfixable on a short enough timeframe to help the business. A lot of progress is Darwinian and for startups and small teams, betting on the wrong stack without the benefit of a long runway can be fatal. Worse, this can go un-diagnosed for a long, long time. If the whole team carries the latest iOS devices on fast, urban networks and the product’s economics are premised on growing a broad-based audience, the failure of that audience to arrive barely makes a sound.
Performance isn’t the (entire) product, of course. Lots of slow or market-limited products do incredibly well. Having a unique service that people want (and will go out of their way for) can override all of these other concerns. Some folks even succeed in App Stores where friction-to-acquire an experience is intense. But products in competitive marketplaces need every advantage.
Some specific tools and techniques can help teams that adopt a performance budget:
- webpagetest.org/easy: this is our go-to tool for one-off analysis.
- WPT scripting: for teams that don’t want to set up a custom WPT instance and have public URLs for their WIP apps, integrating with WPT scripting can be a great way to get regular “checks”
- WPT private instances: teams that want to integrate WPT directly into their CI or commit-queue systems should investigate setting up a private WPT server and hardware
- Scripted Lighthouse: not ready for a full WPT instance? Scripting Lighthouse can help your CI automate analysis of your site and catch regressions
- grunt-perfbudget is an even-easier, automated WPT testing for your CI. Use it!
- Speedcurve and Calibre: these hosted services automate tracking performance over time, delivering an outstanding real-world gut-check
- Webpack Performance Budgets: for teams using webpack in their build steps, enabling this configuration can provide great development-time warning for resources that exceed budgets.
- bundlesize and pr-bot let you set per-script budgets which can be automatically enforced as part of your pull-request process. Recommended!
Success in combating bloat often means turning warnings into hard errors. Teams with CI or commit-queue systems should strongly consider disallowing commits that break the (performance) bank.
For teams starting fresh, my strong recommendation is to start with a stack that embeds strong opinions about app structure, code splitting, and build targets. The best of those today are:
Whatever tools your team chooses, a budget is essential. Without one, even the most advanced, “lightweight” frameworks can easily create bloated, unusable apps. Starting from the global baseline and only increasing the budget based on hard numbers is the best way I know of to ensure your project lands well for everyone.
In the interest of time and space, discussion of future-friendly architectures will have to wait for another post. The curious can dig into Service Workers, Navigation Preload, and Streams. Their powers combined are going to fundamentally transform the optimal page-load for 2018 and beyond.
Lastly, thanks to everyone who reviewed early drafts of this post, including (but not limited to): Vinamrata Singal, Paul Kinlan, Peter O’Shaughnessy, Addy Osmani, and Gray Norton. Hopefully their valiant attempts to direct this article away from error were not overcome by my talent in adding it.