| In Best Practices & Insights, DevOps, Trends

It’s the end of another exciting year at PagerDuty! A few top highlights include raising $43.8 million in a Series C funding round, officially launching in London and Australia, and witnessing the first solar eclipse since 1979.   We also published a lot of good information—so as we wrap up the year, we thought we’d […]

| In PagerDuty Life

My name is Yiyun and I’m currently a Computer Science student at the University of Waterloo. I’m a Software Engineer intern on the Core team here at PagerDuty. In this post, I would like to share some reflections on my experience over the past four months at PagerDuty. My team maintains and develops several core […]

| In DevOps, PagerDuty Life, Tech Talk

On June 28th, 2017, we marked four years of performing “Failure Fridays” at PagerDuty.  As a quick recap, Failure Fridays are a practice we conduct weekly at PagerDuty to inject faults into our production environment in a controlled way, and without customer impact. They’ve been foundational for us to verify our resiliency engineering efforts. Over […]

| In DevOps, PagerDuty Life, Reliability

“Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.” — Principles of Chaos Engineering Netflix, Dropbox, and Twilio are all examples of companies that perform this kind of engineering. It’s essential to have confidence in large, robust, distributed […]

| In DevOps, PagerDuty Life, Reliability

| In DevOps, Redirect

| In Reliability

Corey Bertram, Site Reliability Engineer at Netflix recently spoke to a DevOps Meetup group at PagerDuty HQ about injecting failure at Netflix. For Corey, he wanted to show people what can go wrong, because anything can go wrong, will. Promoting chaos and injecting failure has been a great way to keep Netflix up and running […]

| In Reliability

Ask any PagerDutonian what the most important requirement of our service is and you’ll get the same answer: Reliability. Our customers rely on us to alert them when their systems are having trouble; on time, every time, day or night. Our code is deployed across 3 data centers and 2 cloud providers to ensure all […]

| In Reliability

At PagerDuty, all of our computing infrastructure is automated using Chef. We push out features and changes to our Chef codebase very frequently – often multiple times a day – and this makes it crucial that we test our Chef code before we deploy it to our production environment. As we have learned, failures can […]