This month is a big month for PagerDuty—we turned 10 on February 18! I never imagined we’d reach this milestone, honestly. A lot of Dutonians...by Alex Solomon
February 21, 2019
The other day, a newer Engineering Manager here at PagerDuty, Dileshni Jayasinghe, started a Slack thread expressing joy at how fantastic our engineering team is after attending a conference with engineering folk from other organizations. She explained that she’d shared our practice of owning what we build with someone—who then responded by gazing off into the distance and saying, “That’s my dream.”
On another occasion, she was in a group where “people were amazed that we do chaos engineering in production.”
At this point, the rest of us jumped into the thread, agreeing enthusiastically and contrasting what we experienced with the past—sad tales of lengthy, late-night deployments, maintenance windows, approval committees, and chaotic outages.
There’s so much we’re proud of, and I’d like to share six qualities that set PagerDuty Engineering apart from the crowd!
In engineering, we live PagerDuty’s No. 1 company value—every day. People First.
This guides everything we do and ensures personal development and growth are top of mind when we make decisions.
Diversity is a critical element of this value because we know that different backgrounds, perspectives, and approaches lead to healthy teams and great results. This is reflected in the way we hire, as well as in the importance we place on Employee Resource Groups that support and advocate for people from under-represented backgrounds.
We have a balance of engineering experience levels on each team and a fantastic internship program. We also encourage people from non-engineering backgrounds and other areas of our business to become engineers. For example, earlier this year Ashley Brooks, a Hackbright alumna who demonstrated tremendous engineering talent on our Customer Support team, brought her experience over to engineering, fulfilling her dream of being an engineer and enriching our Platform team in the process.
Another important area where we live our People First values is the way we embrace remote working. At prior companies I’ve worked for, there was often an inflexible culture where everyone had to be in the same room, sitting at their desk during set times. In contrast, at PagerDuty, we know that being exceptional at working in a distributed way leads to healthy teams and people. Our teams span San Francisco, Seattle, and Toronto offices, as well home locations across North America and beyond. We ensure everyone is included in discussions and decisions through smart use of online technology and great remote collaboration.
In most of my previous roles, engineers passed their work to QA, and there was some form of a Release Manager and a separate Operations (or “DevOps”) team that had access to production. This led to silos of knowledge and unclear ownership of finished products.
At PagerDuty, we capture the true DevOps spirit, and engineers are empowered to own what they build, from idea, design, and build, to test and deploy. As soon as changes get into production, our engineers own the monitoring and alerting, and create on-call PagerDuty schedules for the team so there’s always someone available if there’s a problem.
Engineers not only make decisions on all technical aspects of their solution, they are also encouraged to take part in product discussions and get involved in calls with customers by shadowing our support and sales teams.
All this provides a challenging, rich experience that fosters pride of ownership, as well as continuous improvement of our technology and products.
A big theme in our Slack chat was the challenge of deployments. Roman Shekhtmeyster, a Senior Engineering Manager, explained that at two of his previous jobs, there was a process where they “released to production on a quarterly cadence late at night, sitting on the phone with an ops engineer going over a 60-bullet-point deployment document and making sure nothing gets skipped.”
At PagerDuty, things are different. We ship our code continuously, multiple times each day. There are no dedicated testers or specialist “DevOps” engineers; there’s no downtime or approval board.
Despite this, we have high levels of quality, automation, and reliability. Canary Deployments and ChatOps commands make deployments fast and reliable: We check behavior on a subset of our fleet before promoting to the entire environment and can quickly roll back if we detect a problem.
Resiliency of our platform is critical and we take seriously the need to identify problems proactively before they impact our customers. Chaos Engineering is a practice where engineers try their hardest to break production systems, rarely with any resulting customer impact. We do this continuously using our in-house automated tool Chaos Cat and also carry out planned Failure Fridays where engineers across teams carry out failure in a simulated war room format.
Failure Fridays uncover knowledge gaps that are addressed by the team during retrospectives. Furthermore, they provide practice sessions for our incident response team (see next section).
The success of Failure Fridays has led to a more lightweight Failure Anyday concept. This gives engineering teams developing new services the opportunity to test their systems independently for failure in production quickly and easily, without needing to get on the weekly schedule and involve other teams.
This was the biggest eye-opener for me. In the event of an incident at previous companies, chaotic war rooms seemed to include the whole company and suck the life out of everyone in there. It wasn’t clear who was in charge and executives slowed down problem resolution by interfering in decision-making and constantly asking for status updates (more Incident Response anti-patterns here).
By contrast, at PagerDuty we have Incident Response orchestration that’s calm, well organized, and led by an incident commander who ensures the team solving the problem is well coordinated and protected from “executive swoop” so they can provide clear updates, fix the problem fast, and get back to work. Anyone at PagerDuty can become an incident commander. In fact, anyone anywhere can carry out our training online because we’ve open-sourced it!
The revolutionary thing about our incident response system is that everyone who takes part is exposed to the heart of what PagerDuty provides to its customers: a platform and practices that helps everyone make the best use of the most precious resource—time. By proactively and quickly solving problems, end users get the best possible service and engineers get back to what they enjoy: innovating and shipping value.
I remember my first gig as a junior developer. It was difficult to get answers to questions or help from seniors who were heads down working on their own projects with their headphones on. In contrast, one of my team recently told me he got a ton of support from everyone around him and felt super motivated to help others too.
We work in agile teams of around 6-8 engineers, with a dedicated Product Owner using either Scrum or Kanban methodologies. Agile Coaches guide and help improve team dynamics and process. Engineering Managers support the growth of engineers, build teams, and ensure there’s an appropriate balance between technical and product priorities.
Core, Security, and SRE teams build tooling and have dedicated channels to help answer questions. Rather than tell engineers what to do, these teams have a philosophy of making it easy for engineers and delivery teams to “do the right thing.”
We also have a number of specialized language and framework guilds that comprise of engineers across teams who help when people get stuck and organize training. These include Kafka, Elixir, Scala, and Chaos Engineering guilds.
Everyone is encouraged to dive in, make mistakes, and learn continuously, knowing that the engineers around them have been down similar paths and can empathize. This results in a blameless, supportive culture where co-workers have your back every step of the way.
There’s a lot more to cover, but I hope I’ve given you a flavor of what makes PagerDuty special. If you’d like to learn more and help us get even better, we’re always happy to jump on a video chat or grab a coffee to explain more. We frequently attend events—come by and talk to us (this is how I joined PagerDuty and I’m so happy I did!).