As a general rule, whatever percentage you think your test coverage is, it isn’t. Whatever amount of the known surface area you’re covering, there’s going to be an exciting swath of things you didn’t realize that you need to test. Analytics fell into that bucket for us.
We use Google Analytics in our webapp to get a feel for how users use the product, most recently to determine which functionality was prioritized for the mobile site. So generally I look at our analytics every week or two to help developers out, and when Simon asked me to see how popular the mobile site was, I was pretty sure the answer was not “It decreased the use of our webapp by 98%.”:
I won’t name names, but the culprit rhymes with “itwasian”.
Our UI consists almost entirely of HAML powered by backbone.js, often at the same time. Which meant that we refactored the default Google Analytics code:
:javascript var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); :javascript //
You’ll notice that this generates two JavaScript blocks, which we helpfully merged into one.
That’s what broke everything – and by everything, I’m excluding some mobile and rare browsers that still executed the code as intended. For 98% of our visitors, the fact that we merged those two script blocks means that the DOM does not get control after the document.write and the loading of ga.js doesn’t happen before _gat is referenced. _gat doesn’t exist and that’s the end of our analytics on this page.
The simple fix is, of course, to put the second script block back in. But instead we moved to the newest asynchronous Google Analytics code, which doesn’t need 2 blocks, since it only requires _gaq to be a JavaScript object, with the rest of the functionality coming later, whenever the browser gets around to it.
:javascript var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-8759953-1']); _gaq.push(['_setDomainName', 'pagerduty.com']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })();
To ensure I catch this sooner in the future, I’ve set up some intelligence events on our application and the website inside of Google Analytics to detect if we have abnormally low or high amounts of visitors.
There’s at least a day’s delay before the email gets sent out, which is a shame (I received the last one while writing this) but it’s another layer in our web of alerting. Log into your Google Analytics account, and you’ll see “Intelligence Events”:
I intend to set up some more advanced heuristics later, but for now let’s just test that analytics is working:
That day of lag had the advantage for testing that it sent me yesterday’s alert, when our analytics were still broken (but I’m really stretching to call that an advantage).
This isn’t the kind of alert I want to be woken up in the middle of the night for, but I still use PagerDuty as an incident management system for my analytics alerts, partially for the dogfooding, but also to track our media mentions, twitter mentions etc.
For this I’ve set up a service that doesn’t auto-resolve or expire acknowledgements, to track everything that emails “analyze-me@pdt-dave.pagerduty.com”
So now I’m filling up our Fogbugz with new things to test:
I don’t have a good procedure for determining what we’re forgetting to test, but I do have a couple of principles:
We’re still a young company, but we’re fiercely dedicated to uptime and when you’re dealing with bugs, there are known unknowns and unknown unknowns – and when I started here I never would’ve known how much I’d enjoy shooting Nerf guns across the office whenever the average page load time increase.
(With any luck, this will be the first post in a series on what happens when you make a mathematician do your marketing.)