| In Operations Performance, Reliability

Continuous integration (CI) is a software development practice where members frequently merge their work to decrease problems and conflicts. Each push is supported by an automated build (and test) to detect errors. By checking in with one another frequently, teams can develop software more quickly and reliably. In essence, CI its about verifying the quality […]

| In Alerting, DevOps, Operations Performance

This is the final post of our series about transitioning to a DevOps culture (for now). To start from the beginning check out, Why You Should Establish a DevOps Culture. When we talk about DevOps we often defer to a conversation around collaboration and culture. One of the most important aspects of taking a DevOps […]

| In Reliability

At PagerDuty, all of our computing infrastructure is automated using Chef. We push out features and changes to our Chef codebase very frequently – often multiple times a day – and this makes it crucial that we test our Chef code before we deploy it to our production environment. As we have learned, failures can […]

| In Reliability

This is the first post of a multi-part series on some of the operations challenges that the team at PagerDuty is solving. At PagerDuty we strive for high availability at every layer of our stack. We attain this by writing resilient software that then runs on resilient infrastructure. We take this into account when we design […]