PagerDuty
/
Blog
/
AI
/
How to Prevent and Resolve Incidents Using Model Context Protocol (MCP)

Blog

How to Prevent and Resolve Incidents Using Model Context Protocol (MCP)

by Hannah Culver April 6, 2026 | 6 min read

The rapid pace of modern software development, fueled by AI-driven coding and accelerated deployment cycles, has resurfaced a challenge that many development teams already struggled with: the speed of incident response must now match the speed of change. Every day, teams ship code faster than ever, which inevitably increases the risk of a new issue making it to production. The traditional approach—where engineers waste time jumping between disconnected tools—is no longer sustainable. It burns developers out and takes them out of a flow state. The solution is an interconnected AI ecosystem that leverages the operational data you already own.

PagerDuty is contributing to this interconnected AI ecosystem via Model Context Protocol (MCP), a standardized way for specialized AI tools to securely exchange information and actions. MCP acts as the common language, allowing AI tools and agents to talk directly to other tools and agents.

To date, we have over 60 tools that allow users to pull in critical incident data, service information, and even trigger automated responses in any AI-enabled tool of choice. And we’re always adding more tools (check out our release notes). We plan to build out MCP parity with our open APIs, meaning that all the critical PagerDuty data and actions available via API will be available via MCP. The best part? This data can be used during incidents, or when coding. Let’s look at how a flow could work for each of these scenarios.

Preventing Incidents with the Right Data at the Right Time

Imagine you’re creating a new agent, perhaps one that guides your users through a check-out experience, offering deals they should add to their cart before the final purchase. This agent then could share this information back with other teams that are looking for demand signals on popular products. It’s critical that this agent both provides a good customer experience (relevant suggestions, works as intended) and that it relays the correct final purchase information back to internal teams. Now, imagine that you’re making a small tweak to this agent that should allow the user to rate the helpfulness of the agent’s suggestion. Let’s prevent a potential incident.

Building Safer Agents with Past Incident Knowledge with LangSmith

The PagerDuty Incident Responder agent for LangSmith connects to the PagerDuty MCP server, accessing a service’s incident history and context. Developers can input a service name (such as the one this new agent is associated with), incident links for previous incidents with this agent, or symptom description from past failure. In response, PagerDuty will provide critical details that help developers assess risk: past incidents, triage information, and known failure modes discovered in post-incident reviews. This helps a developer prepare for a deploy with the right data at the right time.

Scoring Code Risk Before Deployment with Claude Code

Developers who code in Claude Code can also score the risk of the uncommitted code changes right in their development workflow as another safety mechanism. The PagerDuty Plug-in for Claude Code is a risk scoring tool that brings production context directly into the development process. When a developer runs a simple command like /risk-score, Claude analyzes the new code against 90 days of PagerDuty incident data. The analysis identifies high-risk file types, the extent of the change, and whether it overlaps with areas that have caused past incidents. The developer then receives a clear risk score and actionable recommendations before the code is committed, helping to reduce the risk and cost of major operational failures.

Checking System Health Before Deployment with GitHub Copilot

The PagerDuty Incident Responder custom agent for GitHub gives users access to PagerDuty data, including change correlation and incident data, directly within GitHub Copilot. Additionally, developers can build their own custom agents using PagerDuty MCP tools that offer even broader sets of data and actions. Users can quickly review what is currently happening in the system, ask about previous incidents on the service, and even summarize post-incident review notes. This can flag any concerns that may warrant postponing a deployment.

Accelerating Response During an Incident

The reality is that not every incident can be stopped, especially with the accelerated rate of shipping new code. When an incident does occur, MCP helps teams recover faster by reducing disruption and the cognitive load of having to jump between different tools. Let’s use our new agent example. Say the developer pushing the change to add the rating system skipped the review process above, and an issue slipped through the cracks. Here’s how MCP can make response smoother.

Acknowledge and Review in Cursor

When a new alert fires, you can immediately acknowledge and review it without leaving your coding tool. The PagerDuty MCP Integration with Cursor allows Cursor to pull in PagerDuty data or execute actions, including who is currently on-call, service status details, and incident history. This can help a developer answer key questions and begin triage, asking questions to PagerDuty about incident impact and services, any notes that are pre-populated, and more. Without context switching, a user could also ask GitHub Copilot about recent changes, bringing that information in-line with the critical PagerDuty data without ever leaving their tool of choice.

Automated Diagnostics and Suggested Fixes with Honeycomb data

While a developer is reviewing the issue, the PagerDuty SRE Agent is running diagnostics in the background. PagerDuty will be extending its SRE Agent to use logging and metrics data from Honeycomb via MCP. The SRE Agent will use this critical telemetry to inform triage, quickly determine the root cause, and execute more pointed automation, taking the initial diagnostic burden off the human responder. For example, the agent can quickly suggest a fix, like rolling back a recent change.

Quick Fix and Resolution

Thanks to this seamless flow of information, the responder can then go back to Cursor to take the suggested action—like rolling back the change. This unified, intelligent workflow quickly closes the loop from alert detection to resolution without pushing a user to a different surface. Response is faster, and developers can get back to building with less time spent on interrupt work.

By connecting data and actions from tools like LangSmith, Claude, GitHub Copilot, Cursor, Honeycomb, and more, PagerDuty is making the right data and actions accessible exactly where teams need it. This approach helps reduce friction, accelerate incident management to match the pace of AI-driven development, and ultimately gives developers more time back for higher-value work. We are only scratching the surface of what is possible with MCP.

Want to learn more about PagerDuty’s approach to MCP? Join our twitch stream here.

Want to contribute to our repo? Check out our GitHub repo.

Best Practices devops

You may also love these...

AI
Meet Your Virtual Responder: PagerDuty’s SRE Agent for AI-Driven Reliability

AI, AIOps
The Hidden Failure Points in Your AI Strategy

AI, Cloud Operations, Digital Operations, Incident Management & Response, Integrations, Use Cases & Solutions
What the NFL Taught Us About Human and AI Coordination to Build Resilient Operations

Bibliothèque de modèles et de prompts

Bibliothèque de modèles et de prompts

Intégrité opérationnelle chez FOX

Rapport d'impact FY25

PagerDuty en tournée

Blog

How to Prevent and Resolve Incidents Using Model Context Protocol (MCP)

Preventing Incidents with the Right Data at the Right Time

Accelerating Response During an Incident

Meet Your Virtual Responder: PagerDuty’s SRE Agent for AI-Driven Reliability

The Hidden Failure Points in Your AI Strategy

What the NFL Taught Us About Human and AI Coordination to Build Resilient Operations