Dutonian Story
DataOps Team reduced noise by using intelligent grouping in AIOps
Learn how one team reduced incidents by 37% and supported the workload of 20-30 FTEs with just 10 FTEs by leveraging intelligent alert grouping and auto-incident creation.
- PagerDuty /
- Ops Guides /
- Using PD /
- DataOps Team
The Challenge
How They Were Working
The DataOps team was inundated with PagerDuty incidents from multiple, critical, monitoring sources, responding to as many as they could with the limited resources they had.
Pain Points
Incident Volume
Incident volume was very high because duplicate incidents are created. Reporting on past incidents was inaccurate and made learning harder.
Increased Time to Resolve
Team spends time diagnosing, grouping, and resolving incidents that are duplicative.
Data Inaccuracies
Data being used by end user teams could be inaccurate if a problem in the data sync took longer to identify and resolve.
Key Challenge
Managing an increasing amount of incidents from multiple monitored data sources with limited resources.
The Solution
What They Did
Set up intelligent grouping in AIOps and auto-incident creation
Configure rules to group alerts into incidents
Configure rules to suppress alerts
The Results
How They're Working Now
With intelligent grouping and alert suppression in place, the DataOps team now focuses on strategic initiatives while AIOps automatically manages incident noise and grouping.
Team Testimonials
With PagerDuty, we are able to increase workload without increasing resources
— DataOps Team Member
Wins
Productivity Increased
Allowed team to shift their focus from reactive operational tasks to proactive and strategic development activities including GenAI.
Risk Avoided
Proactively identified and resolved data quality issues, improving overall reliability and customer trust.
Time Saved
Team of 10 FTEs support what would normally be a minimum of 20-30 FTEs.
By The Numbers
Downtime Reduced
Intelligent Grouping reduced incidents by 37% in the first 6 months of usage.
Resource Efficiency
Team of 10 FTEs support workload that would normally require 20-30 FTEs.
Increased workload capacity
Able to increase workload without increasing resources.
Improved data reliability
Proactively identified and resolved data quality issues, improving customer trust.
Lessons Learned & Tips
- Group and ungroup incidents to teach the intelligent alert grouping algorithm proper grouping behavior
- Dedicate time every week to review past incidents and identify noisy, non-actionable events to suppress
- Make use of incident urgencies to separate critical actionable events from non-critical actionable ones to prioritize operational work
Ready to reduce alert noise and improve efficiency?
Start your free trial today and see the difference.
Start Free Trial