January 12, 2012

The bugfix that could make the internet 5% faster

I've been working with Google Analytics for the last 3 years. When I started working with it it was already a very huge player on the market, but I've seen enormous growth on these  years. Google Analytics is the most used web analytics solution in the world. It's used on currently 44.67% of the top million websites on the internet. ga.js is the most popular javascript snippet in the history of the internet.

Google Analytics Usage on top websites:

[caption id="attachment_177" align="aligncenter" width="600" caption="source: builtwith.com"][/caption]

Imagine the responsibility of the Google engineering team that maintains the ga.js javascript file. While having to deal with multiple recent changes and new features on Google Analytics still have to make sure that their code runs as fast as possible and on all browsers that exist. They must support ie5.5 and low end mobile devices, otherwise these browsers wouldn't show up on Google analytics reports. Still they must do it while keeping the code from affecting the website performance.

I must say that they do a great work on keeping that code. The asynchronous syntax while confusing at first is a very clever way to push code execution and loading way down on the queue, so browsers don't delay the page loading to register a GA pageview. It's clear that the GA team takes great care when it comes to how fast and seamless their code is.

The one point that still bothers me a lot regarding performance are the Google Analytics cookies. Let's take a look at what GA cookies look like:
"__utma=96182344.347392035.1326382423.1326382423.1326382423.1; __utmb=96182344.1.10.1326382423; __utmc=96182344; __utmz=96182344.1326382423.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)"

This is a minimum GA cookie. It can get longer if you use Custom Variables and Google Website Optimizer. But let's settle down with the minimum for now.
These cookies are used iternally in GA to keep state and are manipulated by the code on ga.js javascript file. Different from most other cookies you might see out there these cookies don't need to hit your webservers never. Still they hit your website every single time an HTTP request is made.

According to Google SPDY whitepaper the average HTTP request is 700-800 bytes long. That means that GA Cookies represent about 25% of that HTTP request size. The moment you notice GA is present in about 50% of top websites you notice that useless GA cookies going around the internet represent 12% of all HTTP requests.

I've posted a bug regarding this issue on GA-Issues a while ago. The idea is to use HTML5 localStorage to store the cookies on browsers that support it. Still it has attracted no attention so far. This bug fix could easily make the average HTTP request around 5% faster. We're talking about the average speed of the whole internet.

The real picture is not that bad, since this only affect HTTP requests and not HTTP responses and that's where the real data is. Still it's funny to see something that huge going around unnoticed.