Monitoring your infrastructure. It can be challenging, but that’s why you have all of the tools in place to make sure you don’t miss a beat when things go wrong. You’ve probably got Nagios monitoring your overall infrastructure, Pingdom or Neustar WPM monitoring your website, Boundary and New Relic monitoring your apps, or something completely home grown to watch everything else. No matter what you use, you have your systems configured in a way that is unique to your situation.
Since we can all learn from one another, below are a few ways in which your peers are managing and maintaining their IT infrastructure and how they are using PagerDuty to be alerted when systems go down.
dotCloud: Organizing a 24×7 bullet-proof on-call rotation with PagerDuty
by dotCloud – (@dot_cloud)
Instagram: What Powers Instagram: Hundreds of Instances, Dozens of Technologies
by Instagram Engineering – (@instagram)
Twilio + PagerDuty = PhoneDuty
by David S. Shafer, manager of the group responsible for enterprise storage systems at a major national research university (@davidsshafer)