PagerDuty Blog

IRL to IAC: Your Environment to PagerDuty via Terraform

Introduction

Figuring out how to represent your as-built environment in PagerDuty can be confusing for new users. There are a lot of components to PagerDuty that will help your team be successful managing incidents, integrating with other systems in your environment, running workflows, and using automation. Your organization might have a lot of these components – users, teams, services, integrations, orchestrations, etc.

To manage larger or more complex PagerDuty environments in a programmatic fashion, we recommend using Infrastructure as Code tools like Terraform or Pulumi. Both solutions have PagerDuty providers available. In this example, we’ll look at Terraform and you can find their PagerDuty provider here. (We’ll look at Pulumi in a future post.)If you’re new to Terraform, HashiCorp has documentation to help you get started and there are lots of great intro videos on YouTube, including this tutorial by TechWorld with Nana.

Getting started with PagerDuty is a prescriptive process. Certain structures have to be created in a particular order when creating a new account, and there will be steps in the web UI to guide you. For the examples below, I’m assuming you have a PagerDuty account and your users are already onboarded. For more information on getting started with PagerDuty, be sure to check out our online learning at PagerDuty University.

Goals

For this foray into Terraform, I’m going to assume that a production environment is already running. I’m going to use the services included in GCP’s microservices-demo project to give me something to work with. There is enough complexity in this demo to create services, dependencies, and integrations! The service diagram in the README will be helpful for defining services and their relationships.

When I’m done, I want to have a full set of services mimicking the Microservices Demo environment, with defined dependencies. Each service will also have a generic integration endpoint defined so they can receive alerts.

You can view all of the code for this example in my GitHub account.

Basic info you’ll need

I’m going to include my team members as local data in my Terraform files. If your team is already using Terraform, you might be using data modules or local files to hold data, and that’s fine too. If you are planning to use Terraform to manage users, check out the provider docs for pagerduty_user.

To include your users as local data, you’ll only need their user IDs.

Ordering of resources

Once a PagerDuty account has users, the next steps are to create at least one schedule and escalation policy. We do this before creating any services, since having an escalation policy is a requirement for building a service. You can build a new escalation policy for each service, but I’m going to create my escalation policies explicitly.  Each escalation policy has at least one schedule included. You can also create escalation policies with individuals.  

Schedules and Escalation Policies

Here’s what one of my schedules looks like, including the local data for the team members, using the pagerduty_schedule resource:

locals {
    dbre_team  = ["PC6K5C9", "PULO4NW"]
    demo_team  = "POJW28N"
    app_team  = ["PC6K5C9", "PULO4NW", "P73R26T"]
}

resource "pagerduty_schedule" "msd_apps_sched" {
    name       = "Application Team Microservices Demo"
    time_zone  = "America/New_York"
    teams      = [local.demo_team]
    layer {
        name                           = "Application Developers"
        start                          = "2024-01-01T00:08:00-05:00"
        rotation_virtual_start         = "2024-01-01T00:08:00-05:00" 
        rotation_turn_length_seconds   = 604800
        users                          = local.app_team
    }
}

Only a few arguments are required: the time_zone (see the PagerDuty API docs for supported zones); and at least one layer that requires start, rotation_virtual_start, rotation_turn_length_seconds, and users

You can also include restrictions and other features in your schedules. See the provider docs for more information.

I can then reference this pagerduty_schedule resource when I create my escalation policy, using the pagerduty_escalation_policy resource:

resource "pagerduty_escalation_policy" "msd_apps" {
    name      = "Application Team Microservices Demo"
    num_loops = 1
    teams     = [local.demo_team]
    rule {
        escalation_delay_in_minutes = 10
        target {
            type = "schedule_reference"
            id   = pagerduty_schedule.msd_apps_sched.id
        }
    }
}


The pagerduty_escalation_policy resource also has only a few requirements: a name and one or more rule blocks that contain escalation_delay_in_minutes and a target. The target requires an id, which is where I’ve linked it to the schedule I’ve defined above: pagerduty_schedule.msd_apps_sched.id.

Services

Now that I have an escalation policy, I can create my services. Looking at the documentation for the microservices demo, there are 12 potential services in the environment. I’m going to leave off the loadgenerator service, so I’ll create 11 technical services to match the environment. I’m also going to create one business service to represent the entire demo environment in my service graph.

I’m using the most basic configuration for my services. You can check out the provider docs for all of the resource arguments. I can always modify my services later with Terraform if I need to change or add something. 

Services really only require a name, and an escalation_policy. The alert_creation argument is deprecated at the time of this writing with the new setting being create_alerts_and_incidents for all services, so you may not need it in your code. I’m going to create all of my services the same to start, so they will all look like this:

# Redis Cache
resource "pagerduty_service" "Demo_Redis_Cache" {
    name              = "Redis Cache - Microservices Demo"
    escalation_policy = pagerduty_escalation_policy.msd_dbre.id
    alert_creation    = "create_alerts_and_incidents"
}

# Shopping Cart
resource "pagerduty_service" "Demo_Cart" {
    name              = "Shopping Cart - Microservices Demo"
    escalation_policy = pagerduty_escalation_policy.msd_apps.id
    alert_creation    = "create_alerts_and_incidents"
}

 

Because my escalation_policy resources were created with Terraform, I can include their ids directly via the resource reference. I can use different escalation policies for each service if I want to as well, but here I’ve only created two – one for the DBRE team and one for the Application team.

As I mentioned above, I’m also going to create a business service to represent the application stack as a whole. Business services help other parts of my organization identify when incidents occur that impact user-facing applications, and they will appear on the Service Graph page. You can think of the business service as representing the User on the original environment diagram!

The pagerduty_business_service resource looks a bit like the resource for technical services, but it does not have an associated escalation_policy:

resource "pagerduty_business_service" "Microservices_Demo" {
    name        = "Microservices Demo"
    description = "Services aligned behind the Online Boutique Demo"
    team        = local.demo_team
}

 

There are only a couple of additional arguments available for the pagerduty_business_service resource, you can see them in the provider docs

Service Dependencies

Once I have all the services for my environment defined, I can add their relationships using service dependencies. Dependencies will help my team determine the potential impact of incidents across services.

Each dependency has to be defined individually as a relationship between a dependent_service and a supporting_service. Using the convention in the graph diagram in the Microservices Demo README, a dependent_service will be above all of its supporting_services, but below all of the services that depend on it. I will also create a dependency between the Frontend service and my Microservices Demo business service.

The following resource creates the dependency between my Frontend and Checkout services:

resource "pagerduty_service_dependency" "fe_to_checkout" {
    dependency {
        dependent_service {
            id   = pagerduty_service.Demo_Frontend.id
            type = pagerduty_service.Demo_Frontend.type
        }
        supporting_service {
            id   = pagerduty_service.Demo_Checkout.id
            type = pagerduty_service.Demo_Checkout.type
        }
    }
}

I can reference the id and type characteristics of these services since they were both created with Terraform.

The dependency that links the business service to my technical services looks slightly different because of the pagerduty_business_service resource:

resource "pagerduty_service_dependency" "biz_to_fe" {
    dependency {
        dependent_service {
            id   = pagerduty_business_service.Microservices_Demo.id
            type = pagerduty_business_service.Microservices_Demo.type
        }
        supporting_service {
            id   = pagerduty_service.Demo_Frontend.id
            type = pagerduty_service.Demo_Frontend.type
        }
    }
}

When all of my service dependencies are defined, I can use the web UI to compare them with the original application topology diagram from the README. I can move the objects around on the graph to line up like the original:

PagerDuty web UI screen capture of the service graph representation of this environment. It is oriented with the business service “Microservices Demo” at the top and the technical service “Redis Cache” at the bottom. Technical services are represented with circles with green checks in them to represent the OK status of the services. The business service also has a green check but is represented by a square. The relationships among the services are represented by blue lines.

Integrations

Finally, I want to enable all of my services to receive alerts. There are a lot of different types of integrations that can be configured on PagerDuty services, but I’m going to start with a basic Events API v2 integration for each service. You can find more information about how to define other types of integrations, including email integrations and referencing specific vendors, in the provider docs

resource "pagerduty_service_integration" "events_msd_frontend" {
  name     = "API V2"
  type     = "events_api_v2_inbound_integration"
  service  = pagerduty_service.Demo_Frontend.id
}

Next Steps

Now that my services are defined in PagerDuty and they all have integrations to receive events, I’m ready to roll with basic functionality. I can also use these basic resources to incorporate helpful solutions like event orchestrations and automation actions as well as more sophisticated integrations to meet the needs of my team. 

For the latest information on the PagerDuty Terraform provider, always check the provider documentation. Our maintainer, José Antonio Reyes, presents Terraform Time on the PagerDuty Twitch on Wednesdays at 4pm Eastern, and hosts a quarterly roundtable. If you have other questions, we’d love to help! You can reach us at community-team@pagerduty.com.