As operational complexity accelerates, our customers are realizing that it’s impossible to manage their services or innovate for their business without a mechanism to make...by David Shackelford
March 14, 2019
Last fall, we introduced PagerDuty Analytics, a product that combines machine and human response data to provide operational insights that enable organizations to drive process maturity and improved business outcomes. Today, we’re excited to announce that it’s generally available! As part of our expanded Analytics product offering, we’re rolling out a set of prescriptive operational performance scorecards.
Before building these scorecards, we observed and interviewed many high-performing organizations about their best practices for process improvement and drilled into their metrics. Using this information, we created scorecards, based on the structure of your teams, services, and organization in PagerDuty.
Our goal was to mimic the natural processes, encourage the best practices, and follow the rituals of teams already moving toward or operating in a DevOps mentality so we can help teams collaborate better to solve bigger problems. Use these scorecards during regular operational reviews with your teams and stakeholders will help you have more meaningful and impactful conversations.
There are many advantages in having a “you build it, you own it” DevOps culture, including faster and more frequent software delivery, as well as accountability, since every developer owns their code and is responsible for fixing it when something goes wrong. But there’s also a very real possibility of overextending teams, which can lead them to make less-than-ideal operational decisions so they can get results right now instead of investing in long-term scalability and stability.
Clearly, that isn’t desired and should be avoided. One of the best ways teams can improve decision-making is to have regular operational reviews. Through conversations with our customers, we learned that effective operational reviews aren’t just about finding and displaying data. They are also about helping teams make informed, educated, and (oftentimes) opinionated decisions around best practices for modern service management..
These reviews should provide thoughtful insights into not only the tools and services being built and used, but also into team behaviors. Managers can also use the reviews to uncover unplanned work and/or unnecessary operational load any team may be encountering.
We also learned that one of the most valuable features of any analytics solution should be to enable teams and stakeholders to have more effective operational reviews within the organization. Our open-source operational reviews documentation will help guide you and your team to gain better insights into your team’s overall health and your service’s durability and reliability, as well as its impact to the business, enabling your team with the information they need to continuously improve operational maturity.
We believe that a discussion about what went well and what could be better should be scheduled at the end of every rotation so responders are aware of potential issues such as “What service was the biggest headache?” and “What alert woke people up overnight?”
Our On-Call Hand-Off Report Scorecard helps the next on-call rotation determine what they may want to focus on. Additionally, moving to this proactive model will help teams respond faster in case an incident does crop up because they’ll have the information to make better-informed decisions in less time.
The scorecard was designed to help improve your team’s effectiveness and should be used during the weekly operational review. The scorecard displays a summary of several metrics, including:
In addition to on-call hand-off reviews, we also want to focus on the technical services teams build, own, and monitor. Technical services are the building blocks in the foundation of your PagerDuty monitoring ecosystem, and teams use them to detect and quickly fix infrastructure issues. Because of the importance of these services, we encourage teams to sit down and review performance of all services. However, we often see teams ignoring notifications from noisy services—because so many alerts are coming through, it simply takes too much time to separate actionable alerts from noise and easier to ignore than address them.
This is where our Service Ops Scorecards come in. They provide provide detailed views into technical service health and reliability, with metrics such as total downtime, performance lag, and MTTR, to help you and your team find the areas you may want to focus on improving in order to prevent major problems.
Coupled with PagerDuty Visibility—which provides a holistic view of machine data, services, teams, corresponding actions, and business impact of incident response—the operational review scorecards help your team coordinate response efforts, effectively communicate to stakeholders about how long your team takes to solve unplanned issues, and the time and work required to resolve major incidents.
Interested in seeing your team’s scorecards? Contact us to sign up for a free trial!