Great to be back in the office after an amazing week at Velocity 2016 in Santa Clara – still with an ongoing buzz of motivating quotes left in my head from all the great speakers. David Hayes, David Shackelford, and Donny Nadolny at PagerDuty were all honored to deliver three insightful and widely received talks during the event. There was a lot of great energy and exhibition halls were filled with enthusiastic practitioners, web and system developers and administrators, and other avid members of IT — all eager to explore new solutions, to share and learn more about the newest technology services and proven best practices from some of the most successful and impacting technology organizations around today.
PagerDuty’s booth was busy with a mix of visitors curious about different features of our platform. Many were excited to experience demos of Service Group (grouping related sub-integrations and services to align with service structures), updated message buttons support for acknowledging and resolving incidents directly within Slack from the recent PagerDuty Slack Integration launch, and general features of the PagerDuty service from our roots of alerting, on-call scheduling, escalation policies, to more advanced features including API v2.0 and Custom Events Transformer that extends PagerDuty’s platform and simplifies the creation of custom integrations. Others were interested in PD-CEF (PagerDuty Common Event Format), event normalization, enrichment, and other features for increasing situational awareness and quickly correlating and spotting issues across the infrastructure stack.
Other times, existing customers cheerfully greeted us and stopped by just to let us know they’ve been enjoying the product’s impact on their lives and commented on our latest booth decor and grab some new swag for teammates. Oh, and I can’t leave out that occasional customer who stops by to say, “Oh, I really don’t like PagerDuty…” pauses to witness how that is interpreted, but then follows up with a grin and says, “Oh right, only when I’m on-call and have to work on a real issue of course, not the product itself.” It always makes me happy to know those folks are in great spirits to make jokes — after all — I can always chuckle and respond with, “trust me, I know what you mean,” since I’ve had my fair share of on-call responsibilities from my past experiences with IT on-call responsibilities.
Carrying on though, this year at Velocity was very unique in that our booth had a fun interactive attraction that many stopped by to participate in. An enormous throw-back from the 80’s, Lite-Brite appeared beside Verizon’s Digital Media Services’ booth with their token-activated Pigeon Claw machine, down the aisle from one of the booth’s action packed pinball machine — “how do you want to be notified during an outage?” — with pins to push in to the display:
||Push(via PagerDuty Mobile App)
Visitors spectated and pondered the results in awe, brightly displaying that traditional e-mail notifications had fallen behind more modern methods of push notifications! Others creatively re-arranged others’ votes and continuously morphed the images into different shapes throughout the event as they passed by, “Is that a snail or a whale? Here, let me fix that with my preferred method of notification, SMS snail-shape it is!”
Lessons from the dark side: DevOps and product management
On to more exciting things though, one of our very own Product Managers, David Shackelford, delivered insightful lessons on the opportunities and challenges of being a product manager within a DevOps model in his talk, “Lessons from the Dark Side.” He covered how product, development, and operations choices all affect each other, and highlighted successful strategies implemented here at PagerDuty. Specifically, he focused on the importance of feedback, common language, and shared culture across everybody involved in the journey from customer problem to product solution. The talk was well-attended and well-received, with lots of positive traffic on Twitter. I specifically enjoyed reading some of the tweets from those who appreciated David’s talk:
Why DevOps is more than just automation
David Hayes, Director of Platform Strategy, from PagerDuty jumped right in and delivered a talk about DevOps and how it’s, “More Than Just Automation.” As an overview, he introduced key findings from a survey that PagerDuty conducted. He also discussed how 75% of DevOps companies responded to issues within half an hour and never take longer than an hour to respond. He supports building an organizational culture of, “you build it, you run it” (Werner Vogels, 2006) where agile transformation involves a tighter feedback loop making developers responsible for their code in production which ultimately improves service quality. With this culture, developers and ops members can empathize by learning more about one another’s role and responsibilities. He continued to discuss how resilient systems need resilient people to support and build them, alert aggregation and how it impacts and improves situational awareness, and how people, process and DevOps tools strengthens organizations.
Debugging distributed systems
Donny Nadolny, a notable Scala developer at PagerDuty responsible for improving the reliability of PagerDuty’s backend systems, spent a significant amount of time investigating problems arising with distributed systems such as Cassandra and ZooKeeper. He delivered a talk covering PagerDuty’s debugging processes within distributed systems touching upon ZooKeeper, TCP issues, how IPsec is done at PagerDuty and shared some valuable lessons learned:
- Lesson 1 – Don’t lock and block, and keep in mind that TCP can block for a really long time
- Lesson 2 – Automate debug info collection (stack, trace, heap dump, transaction logs, etc)
- Lesson 3 – Both application/dependency checks and leader/follower heartbeats should be deep health checks!
There were so many impressive speakers at Velocity that really wowed us. Just to mention a few of the notable speakers that Peter Sobot, software developer at PagerDuty, raved about:
- Dan Slimmon and his talk about troubleshooting and using differential diagnosis, like doctors do, to solve software problems
- Emily Nakashima from Bugsnag discussed ways to better monitor end-users’ performance, and recommended putting front-end developers on-call
- Alice Goldfuss from New Relic had an interesting talk about “Rockstars, Builders and Janitors” where she discussed how rotating engineers between different roles can help everybody get perspective. She also recommended putting more people on call.
That about sums it all up from here at PagerDuty! If you didn’t get to attend Velocity Santa Clara this year, I recommend you check out one of their upcoming events. I’m really looking forward to another IT industry event like Velocity so I can learn more about others’ IT Ops and DevOps best practices and processes and bring the wealth of information back to PagerDuty.