SailPoint is the leader in identity security for the modern enterprise, empowering complex companies worldwide to build a security foundation grounded in identity security. Harnessing the power of AI and machine learning, SailPoint automates access, delivering only the required access to the right identities and technology at the right time.
SailPoint has experienced continued growth as companies experience more—and increasingly sophisticated—cybersecurity threats. Additionally, the COVID-19 pandemic drove more people to work from home, creating new security risks for their employers. As the security landscape continues to evolve, SailPoint’s DevOps team has to innovate and find new ways of working.
Omar Lopez is a DevOps Manager for SailPoint’s cloud offering. His team is responsible for everything related to observability, from metrics and logging to tracing and alerting—anything that enables SailPoint to identify and address issues before they become problems for customers. “The uptime of our products is super important to our mission here at SailPoint,” Lopez said.
In an effort to optimize the operations of its growing DevOps team, SailPoint recently made some structural changes to the team, which included organizing smaller teams and adopting a service-based ownership model. Keeping people at the center of this cultural shift was a priority for Lopez.
“The happiness of my engineers is very important to me,” Lopez said. “When I joined SailPoint, it was tough for one engineer to handle everything DevOps-related. It was also clear that we needed to improve our on-call process and make that less burdensome. Our team has really grown, and our goal is to pivot to total service ownership.”
SailPoint also sought improved analytics to support smoother handoffs and reduce the burden of being on-call to improve team health among its engineers. “Prior to implementing total service ownership, we were challenged with having the bandwidth to properly address every single problem as our company grew, and with it, we added more people and technology,” explained Caitlin Green, DevOps Engineer.
SailPoint was already using PagerDuty but desired to better utilize its investment by improving its operational practices, including through improved coordinated responses.
SailPoint integrated PagerDuty with monitoring tool Prometheus. Prometheus sends alerts to PagerDuty, which then routes them to the service owner defined by Rulesets. “PagerDuty’s Global Rulesets mean we can route alerts directly to the right on-call engineer for a particular service, rather than it going to a triage engineer who has to figure out who they should send it to,” Lopez said. “That’s a game-changer for us.”
SailPoint also integrated PagerDuty with Slack to help manage lower priority incidents, resulting in fewer interruptions, both to work and personal lives out of hours.
PagerDuty has become an important part of SailPoint’s service ownership model, empowering teams to take responsibility for issues affecting their services and reducing pressure on triage teams. As SailPoint has embraced service ownership, its DevOps team saw an 85% drop in the number of incidents being directed to its team. “With PagerDuty, we’re able to redirect critical work to the right people,” Greene added.
SailPoint is enhancing workflows using automation. For example, by enabling Intelligent Alert Grouping (IAG) on AWS CloudWatch, SailPoint has reduced noise and sped up response. Previously, a database failure would fire 60+ alerts, continuously disrupting the on-call engineer. By utilizing IAG, SailPoint condenses all alerts into a single incident for the engineer to acknowledge and resolve, freeing up time to fix the problem.
SailPoint also automated how it builds monitoring into services, creating a self-service process for engineering teams. Lopez explained, “As we transition to service ownership, we are focusing on getting all our engineering teams, services, and microservices into PagerDuty. We put a lot of effort into automating that process. We built a self-service tool using Terraform that all engineering teams can leverage to create their own services, and their own rules for those services through code—without the need for DevOps.”
SailPoint is in the early stages of introducing its customer support team to the incident response process. By onboarding customer support to PagerDuty, SailPoint engineers can provide relevant context to service representatives.
Matt Smith, a Director of DevOps, explained, “If there’s an issue, the goal is to get more proactive about reaching out to customers and letting them know that we’re on it before they see it.”
By implementing PagerDuty, SailPoint has matured its digital operations and moved closer to its goal of service ownership, with benefits including:
“PagerDuty has given us the tools we need to continue our journey toward service ownership,” Lopez said. “Importantly, PagerDuty has also enabled us to reduce on-call fatigue and boost the happiness of our engineers—one of our top priorities.”
Matt added, “Having PagerDuty in the mix is tremendously beneficial to how we manage our on-call response. PagerDuty helps us disseminate responsibility to specific engineers, giving clear ownership and transparency, and enables us to track what teams are working on and which incidents are still outstanding.”
SailPoint continues on its path to total service ownership and is onboarding more engineering teams onto PagerDuty. The company is also looking at how it can leverage more of PagerDuty’s capabilities to mature its incident response framework. During broader incidents, it plans to use PagerDuty for better communication and coordination with cross-departmental teams including customer service, product management, and executive leadership.
Find out more about SailPoint’s DevOps journey in The SailPoint Tech Blog.
To learn how PagerDuty can help your team make things simple and transform operations in a digital-first world, contact your account manager or try a 14-day free trial today.