Voices wield power. Staying silent is not an option. We must speak up and honor those who do. October is National Domestic Violence Awareness Month,...by PagerDuty
October 29, 2018
As a general rule, whatever percentage you think your test coverage is, it isn’t. Whatever amount of the known surface area you’re covering, there’s going to be an exciting swath of things you didn’t realize that you need to test. Analytics fell into that bucket for us.
We use Google Analytics in our webapp to get a feel for how users use the product, most recently to determine which functionality was prioritized for the mobile site. So generally I look at our analytics every week or two to help developers out, and when Simon asked me to see how popular the mobile site was, I was pretty sure the answer was not “It decreased the use of our webapp by 98%.”:
I won’t name names, but the culprit rhymes with “itwasian”.
That’s what broke everything – and by everything, I’m excluding some mobile and rare browsers that still executed the code as intended. For 98% of our visitors, the fact that we merged those two script blocks means that the DOM does not get control after the document.write and the loading of ga.js doesn’t happen before _gat is referenced. _gat doesn’t exist and that’s the end of our analytics on this page.
To ensure I catch this sooner in the future, I’ve set up some intelligence events on our application and the website inside of Google Analytics to detect if we have abnormally low or high amounts of visitors.
There’s at least a day’s delay before the email gets sent out, which is a shame (I received the last one while writing this) but it’s another layer in our web of alerting. Log into your Google Analytics account, and you’ll see “Intelligence Events”:
That day of lag had the advantage for testing that it sent me yesterday’s alert, when our analytics were still broken (but I’m really stretching to call that an advantage).
This isn’t the kind of alert I want to be woken up in the middle of the night for, but I still use PagerDuty as an incident management system for my analytics alerts, partially for the dogfooding, but also to track our media mentions, twitter mentions etc.
For this I’ve set up a service that doesn’t auto-resolve or expire acknowledgements, to track everything that emails “email@example.com”
So now I’m filling up our Fogbugz with new things to test:
I don’t have a good procedure for determining what we’re forgetting to test, but I do have a couple of principles:
We’re still a young company, but we’re fiercely dedicated to uptime and when you’re dealing with bugs, there are known unknowns and unknown unknowns – and when I started here I never would’ve known how much I’d enjoy shooting Nerf guns across the office whenever the average page load time increase.
(With any luck, this will be the first post in a series on what happens when you make a mathematician do your marketing.)