I had the opportunity to travel to London to attend AWS Enterprise Summit (July 6) and AWS Summit (July 7) last week. As an Amazon (pre-AWS) and Microsoft Azure alumnus, it was fascinating to step back and see how far this whole “cloud” story has really come. Here are a few key takeaways I had from a great show.
#1) Moving to cloud (properly) means adopting a DevOps culture
I really liked that Stephen Orban (Head of Global Enterprise Strategy and who opened with the keynote on the first day’s Enterprise Summit) jumped on this point right away: your Infrastructure team MUST become a Cloud Center of Excellence and start to distribute top-to-bottom operational responsibility to the Product/Services teams, the Internal IT team, and the Desktop Support team.
I often quote Dr. Werner Wogels (CTO of Amazon.com and who delivered one keynote each day) in my own talks, and it caused me to reflect on the fact that it’s now been over 10 years since he said in an interview with Jim Gray the famous (infamous?) phrase:”you build it, you run it.” But I always preface that with what he said just before that (my emphasis added): “Giving developers operational responsibilities has greatly enhanced the QUALITY of the services, both from a customer and a technology point of view.” I truly believe that’s why Amazon.com and AWS continues to be as successful as it is, and it’s the heart of what DevOps culture represents.
#2) Software companies (that’s all of you, by the way) are migrating wholesale to the cloud
It goes beyond the oft-quoted-at-the-conference cheeky phrase “Friends don’t let friends build datacenters.” Companies like GE (who recognize themselves as a “technology company” or a “software and analytics company”) aren’t just continuing to run Mode 1 systems in the way they always have, they’re actively closing datacenters (30 of their 34 worldwide) and beginning to reap the benefits. I specifically liked all of the blueprints/reference apps & projects that Steven and Werner pointed to at various points in their keynotes. Pretty cool to see this attitude shift in the industry.
This has some really interesting implications on incident response. In one keynote, the CTO of FanDuel pointed out that while their operations team was only ~10 people, they relied on AWS Enterprise Support to help keep things running. Stay tuned for how PagerDuty might help to handle this type of case in the future (hint: using the Response Mobilizer).
#3) It’s time to go beyond infrastructure
It’s easy to say and much harder to realize, going “serverless” was definitely a frequent topic at the conference. Per Travelex, serverless is “like playing with lasers and magnets – just magic.” Don’t get me wrong, we at PagerDuty have more than just dabbled in it: we built our new Custom Event Transformer to host JS snippets using AWS Lambda. But don’t be fooled into thinking it means investing less in overall operability, or that it somehow enables the fabled #NoOps.
Overall, getting your engineers to spend more time on your actual business (e.g. services that drive revenue) was a recurring theme. So once you get beyond having half of your staff invested in your infrastructure, the next nut to crack becomes how to build the right stuff, not how to build the stuff right. With the extensive portfolio of AWS, I’ll be curious to see if Amazon looks to invest in closing that development loop with some Product Manager tools in the future. (My colleague has a great related blog post and talk about being a Product Manager in a DevOps World. You should read it!)
#4) Your services may need to evolve: from SOA to Microservices
Werner shared a great set of learnings about Amazon’s first foray into Service-Oriented Architecture (SOA) back in the early 2000s. In particular, he mentioned organizing too heavily based on data sets (Customers, Catalog, etc.) and not by function or reliability/scalability characteristics. With the shift to Microservices, he also shared a fantastic summary of “Some Signs You Are Not at Microservice Level Yet”:
Photo courtesy of Denise Yu: https://twitter.com/deniseyu21/status/750993932591521793.
This evolution of Services also has a profound impact on your incident response workflow. They are critical to how you orient and organize your response, which is why we continue to invest in putting the right Service concept in PagerDuty. Just don’t forget Gall’s Law when you do invest in your Services:
“A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.”
Putting a bow on it
Amazon is where my on-call journey began, so it almost feels a bit like returning home (or maybe more like a vacation home 3000 miles away). The AWS portfolio is vast, and yet they continue to innovate and change the face of the industry.