Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Check out the latest capabilities we released.
Flexible schedules, escalations, & alerting
Automated, best practice incident response
Powerful context & noise reduction at scale
Quantify real-time business & technical impact
Improve with modern, prescriptive insights
Over 300 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
We've created a maturity model to assist on the journey to digital operations excellence. Take our short assessment to find out where your team falls!
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
In part 2 of our postmortem series, we dig into how to establish a culture of continuous learning, from getting leadership on board to invoking...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
One of the best things about working at PagerDuty is that our customers, our users, our champions, and our buyers are all the same people. With this year’s push into major incident response, we’ve spent a lot of time talking to Network Operation Centers (NOCs) about what the future holds for them.
Every job changes with new technology — some, like long-distance trucking will be completely disrupted by self-driving trucks — but after all the discussions we’ve had with the best NOCs around, it looks like their evolution will be significant but manageable.
I’ve always thought about PagerDuty as helping your Mean Time To Promotion, in keeping with that, here are some of the possible futures we see for NOCs.
One of the most straightforward paths is towards becoming a Site Reliability Engineer (SRE).
If you want a job doing this, you need all the troubleshooting skills of a systems admin, layered on with a deep understanding of monitoring. The goal of an SRE is to detect glitches before they develop into problems that users can notice. And if that doesn’t work, SREs moves heaven and earth to get everything back online. You’ll frequently see SRE positions at big cloud or online companies, like Amazon, Google, Heroku, and even Etsy. People get really cranky if they can’t buy things immediately, and SREs are there to make sure they can.
SREs keep the world online (ok, that’s kind of a big ask). As an SRE, you would work with a team to predict needs and build scale in a way that is fluid and invisible from the front end. Site Reliability Engineering is the art of never letting the user see you sweat, as a company. You’re working to make sure there is always enough capacity, enough uptime, enough pipe, and enough monitoring to make sure something isn’t falling apart invisibly.
Instead of firefighting, you want to be a building inspector, designing wider hallways, doors that always swing out, and multiple staircases (metaphorically). It may look heroic to jump in with a fire ax and a hose and tear down doors and fight flashovers, but it’s better to never need the heroics if you have smart policies around building materials and building sprinklers.
Historically, quality assurance (QA) at software companies has had an unfair reputation. In fact, there are lots of great companies like Microsoft where there’s a parallel track for Software Development Engineers in Test (SDET). Click testing has long since become automated unit tests which are now automated click & API tests against the staging server.
Operations and QA are the formalizations of, “Eek! Things are broken.” If you have a solid QA team checking things in test before you deploy, there are far fewer surprise outages. If you have an Operations team, they design and build things mindfully, considering risk and performance, rather than simply installing and hoping things work right.
At its core, DevOps and Operations are about getting servers or containers to meet the “three R requirements”:
To me, that also sounds a lot like QA.
DevOps means if something broke and woke you up, you are empowered to write the test that ensures it never makes it to production again — you’re already the best part of QA.
As you get better at preventing downtime or outages and streamlining requests, you can scale volume more easily because you’re not responding to one-off requests. Think about the difference between manually resetting user logins and offering an automated system to do it. You may spend the same amount of time fixing user login problems, but for ten to twenty times as many users.
One of my favorite NOCs I’ve visited is a telecommunications company in Los Angeles — it’s a classical NOC with an unconventional feel. Starting from the massive wall of dashboards, the room is arranged in rows, with each row representing a promotion in their operations org. Promotions average 6-12 months apart, with clear milestones and can stop with being in the back row (as a defacto SRE) or into other parts of the org. With so many companies lamenting how hard it is to find talent these days, I expect this will become more common.
At PagerDuty we treat our support team in much the same way: employees in our support org have gone on not only to be managers or more technical roles inside that org, but also to the engineering, marketing, and sales teams and I don’t see any sign of that stopping (unsurprisingly, this makes it easier for us to hire great people)
Predictions are hard, especially about the future; but it’s clear that the future of the NOC will not be humans watching screens waiting to press buttons. For many classes of always-on applications, it will still make sense to keep people ready to jump into action — the question is what to do with the other 99% of their time.
The NOC has undergone quite a bit of change in recent years and will continue to do so. Those that adapt to the changing digital landscape will position themselves for success, and we look forward to navigating that transition with you.
Your team had been fighting this major incident for hours, but your investigation was hitting one dead end after another. Finally, you managed to isolate...
600 Townsend St., #200
San Francisco, CA 94103
905 King Street West, Suite 600
Toronto, ON, M6K 3G9, Canada
1416 NW 46th St., St. 301
Seattle, WA 98107
5 Martin Place
1 Fore St,
London EC2Y 9DT
© 2009 - 2019