How I Centralized Our Scattered Business Logic Into One Clear Pipeline For Our Elixir Webhook Service

Some people inherit treasured jewelry, lavish mansions, or even grand fortunes. Software engineers inherit legacy code, written by people no longer working at the company, that is in need of more yak-shaving than a team of grizzly Stanley Cup winners after their long road to victory. Which will undoubtedly be the Toronto Maple Leafs this year.

See Brent Burns’ glorious playoff beard pictured below.

Playoff Beard

Ok, that might be an over exaggeration but my team (the software one, not the hockey one) found ourselves in a similar spot a couple weeks ago. We finished our latest project and were starting a new one which involved rewriting an old (5+ years) Scala service into Elixir. This service backed PagerDuty’s webhook processing system.

The decision to undertake this project was made for two main reasons:

To help the company standardize on Elixir due to an org-wide decision made earlier this year.
To give my team better visibility into how the service actually worked. Like most services that have gone through five years worth of changing business requirements, increased scale and developers coming and going, this code had become difficult to read and to maintain. What resulted was files resembling this:

Glorious Scala code

Forgive me for the oozing sarcasm, it’s my way of coping.

This rewrite project had actually started a year ago but was put on pause. What resulted was a relatively small Elixir code base that my team could use as a starting point for our glorious rewrite. This code base wasn’t in nearly as bad a spot as our 5+ year-old code base but it was originally written by people who were quite new to Elixir (I’m including myself as one of those people).

It followed the convenient approach of spreading your business logic across lots of different parts of the application. This made it hard for new team members to get a big picture view of the service’s application logic without travelling to the deep, dark depths of Mordor to figure out that the code that expires stale webhooks is embedded in the Parser module whose sole responsibility should be encoding/decoding webhooks to/from JSON.

Shortly before starting this project, my team was encouraged to take an excellent Elixir course that has unofficially become required training for all full-stack engineers at PagerDuty. This course is called “Elixir for Programmers” by Dave Thomas, aka The Coding Gnome (who I learned was also one of the authors of the timeless book, “The Pragmatic Programmer”).

One of the principles emphasized by the course is the importance of keeping your business logic separate from everything else, especially your framework. This really resonated with me, so much so that for my next Hack Day I refactored the Elixir code base mentioned above to adhere to this principle. What resulted, was one divine module which combined all of our previously scattered business logic into one clear pipeline that would call out to other modules.

New Business Logic

Here’s what the rest of this post looks like:

The Original Code
The New Code
An Aside About Elixir Pipelines
Back to the New Code
Did This Refactor Actually Help???
Recap

The Original Code

Note that this code is a simplified form of our actual code to help illustrate the point of this blog post better. For more information about how PagerDuty webhooks actually work, see our dev docs and webhooks setup guide.

So, for this blog post, let’s assume that the webhook processing service has the following business logic:

Receive webhooks from an upstream service that generated them in response to an incident being triggered/acknowledged/resolved.
Parse the received webhooks from a JSON string.
Check if the webhooks are older than a certain threshold (i.e. expired). If they’re expired, drop them.
Send the webhooks to their endpoints via an HTTP POST request. These endpoints can be internal, customer-defined endpoints or external tools that PagerDuty integrates with like Slack, Jira and ServiceNow.
Persist these webhook endpoint responses into a database (can be used to help debug customer issues).

If you’re like me and learn best through examples, here’s a real-life example of our PagerDuty Slack Integration. You can setup a webhook so that whenever an incident gets triggered/acknowledged/resolved, a notification gets sent to your team’s channel in Slack (so your whole team can know that it took you over 15 minutes to acknowledge that one incident last Friday night when you were “on-call” and “forgot” to bring your laptop with you to the bar).

PagerDuty Slack webhooks

Enough talk, here’s some code.

Note that all the code is available on Github.

Receiver receives the webhook from the upstream service, extracts the webhook from the received JSON string via Parser and then sends the webhook to its endpoint via Sender:

defmodule Receiver do  

  def receive(raw_webhook) do
    parsed_webhook = Parser.parse(raw_webhook)
    case parsed_webhook do
      {:ok, endpoint, payload} ->
        # if it was parsed successfully, send it to its endpoint
        Sender.send(endpoint, payload)
      {:error, error_msg} ->
        # if an error was returned from trying to parse it, don't send it
        Logger.error("error - #{error_msg}")
    end
  end

end

Parser decodes the webhook from a JSON string. It also raises an error if the webhook expired:

defmodule Parser do

  @expired_threshold_seconds 5

  def parse(raw_webhook) do
    decoded_webhook = JSON.decode(raw_webhook)
    endpoint = decoded_webhook.event.endpoint
    payload = decoded_webhook.event.payload

    time_since_created = decoded_webhook.event.created_at - Time.now
    is_expired = time_since_created > @expired_threshold_seconds * 1000
    case is_expired do
      true -> {:error, "webhook expired"}
      false -> {:ok, endpoint, payload}
  end

end

Sender sends the webhook to its endpoint. It also handles the different types of responses returned from the endpoint by persisting them to the db:

defmodule Sender do

  def send(endpoint, payload) do
    HTTP.post(endpoint, payload)
    |> handle_response(endpoint, payload)
  end

  defp handle_response(response = {200, response_body}, endpoint, payload) do
    Logger.info("webhook sent successfully")
    DB.save_successful_response(endpoint, payload, response_body)
  end

  defp handle_response(response = {404, response_body}, endpoint, payload) do
    Logger.error("failed to send webhook, endpoint returned 404")
    DB.save_failed_response(endpoint, payload, response_body)
  end

end

DB stores the different endpoint responses in the db:

defmodule DB do

  def save_successful_response(endpoint, payload, response) do
    MySQL.insert('successes', endpoint, payload, response)
  end

  def save_failed_response(endpoint, payload, response) do
    MySQL.insert('failures', endpoint, payload, response)
  end

end

Here’s a diagram of the business logic and how it’s spread out over different modules:

Old Business Logic

In order to piece together that diagram above, we have to start in Receiver, go to Parser, go back to Receiver, go to Sender and finally finish off in DB.

Along the way, we encounter some fundamental logic about how webhooks get processed which is hidden in modules that seemingly have nothing to do with that logic. As an example, the logic about not processing a webhook if it expired is contained in the Parser module whose sole responsibility should be decoding the webhook from JSON. Similarly, the logic about storing different types of webhook endpoint responses into the database is contained in the Sender module whose sole responsibility should be sending the webhook to its endpoint.

Overall, this webhook processing logic wasn’t too hard to figure out for the simplified code above. But for my team’s actual application – which had a lot of additional complexity – it was quite daunting.

The code above also does not take advantage of the fact that webhook processing can be thought of as a pipeline which can be expressed beautifully in functional programming languages, where the code takes in some initial state and keeps modifying it through different stages to produce an output. Elixir implements a simple version of these pipelines via its pipe operator (|>).

The New Code

To centralize our webhook processing logic into one module and to express it as a pipeline which modifies some state, I moved all the bits of logic spread throughout the code into a new module called Processor.

At the top of the Processor module, the state used by the pipeline is defined in WebhookState:

defmodule Processor do

  @expired_threshold_seconds 5

  defmodule WebhookState do

    defstruct(
      endpoint_url: "",
      payload: "",
      created_at: "",
      response: "",
      result: ""
    )

  end

Within Processor, the process function contains our business logic pipeline that modifies the WebhookState:

  def process(webhook = %WebhookState{}) do
    # here is the pipeline containing all the business logic :) !
    # if the current stage returns an :ok, we move onto the next stage
    # if it returns an :error, the pipeline is stopped and we move
    # onto the error handler in the else block at the bottom
    with(
      {:ok, webhook} <- check_webhook_expired(webhook),
      {:ok, webhook} <- send_webhook(webhook),
      {:ok, webhook} <- handle_response(webhook),
      {:ok} <- save_response_to_db(webhook)
    ) do
      Logger.info("webhook processed successfully :) !")
      :ok
    else
      {:error, msg} ->
        msg = "error processing webhook :( ! - #{msg}"
        Logger.error(msg)
        :error
    end
  end

Lastly, the different stages of the pipeline are defined:

  defp check_webhook_expired(webhook = %WebhookState{created_at: created_at}) do
    # this was moved from Parser to here :) !
    time_since_created = created_at - Time.now
    is_expired = time_since_created > @expired_threshold_seconds * 1000
    case is_expired do
      true ->
        # if the webhook is expired, return an :error to stop the pipeline
        {:error, "webhook expired"}
      false -> {:ok, webhook}
    end
  end

  defp send_webhook(webhook = %WebhookState{
    endpoint_url: endpoint_url,
    payload: payload
  }) do
    {status_code, response_body} = Sender.send(endpoint_url, payload)
    # store the response returned from the endpoint in WebhookState so
    # that the next stage of the pipeline (handle_response) can access it
    webhook = %WebhookState{webhook | response: {status_code, response_body}
    {:ok, webhook}
  end

  defp handle_response(webhook = %WebhookState{response: {200, _response_body}}) do
    # this was moved from Sender to here :) !
    Logger.info("webhook sent successfully")
    # store the result in WebhookState so that the next stage of the
    # pipeline (save_response_to_db) can access it
    webhook = %WebhookState{webhook | result: "success"}
    {:ok, webhook}
  end

  defp handle_response(webhook = %WebhookState{response: {404, _response_body}}) do
    # this was moved from Sender to here :) !
    Logger.error("failed to send webhook, endpoint returned 404")
    # store the result in WebhookState so that the next stage of the
    # pipeline (save_response_to_db) can access it
    webhook = %WebhookState{webhook | result: "failure"}
    {:ok, webhook}
  end

  defp save_response_to_db(webhook = %WebhookState{
    result: "success",
    endpoint: endpoint,
    payload: payload,
    response: {_status_code, response_body}
  }) do
    # this was moved from Sender to here :) !
    DB.save_successful_response(endpoint, payload, response_body)
    {:ok}
  end

  defp save_response_to_db(webhook = %WebhookState{
    result: "failure",
    endpoint: endpoint,
    payload: payload,
    response: {_status_code, response_body}
  }) do
    # this was moved from Sender to here :) !
    DB.save_failed_response(endpoint, payload, response_body)
    {:ok}
  end

end

Here’s a diagram of the refactored logic which is all contained in the Processor module which calls out to the other modules:

New Business Logic

With the new code, all of our webhook processing logic is now contained within Processor and instead of jumping around modules to see it we only have to look in one module.

There is also a simple pipeline defined in process where it first checks whether the webhook expired. If the webhook expired, an error is returned and the pipeline is stopped so the webhook does not get sent out. If it didn’t expire, then the webhook is sent to its endpoint in the next stage. Then the different responses returned by the endpoint are handled and saved to the database.

The state needed for the pipeline was defined at the top of the module in WebhookState and the pipeline modifies this state as it moves through its different stages. For example, in the send_webhook stage after sending the webhook to its endpoint, the response is stored in the state so that the next stage, handle_response, can access it.

Lastly, the scattered logic about not sending expired webhooks and handling different webhook endpoint responses was moved from the Parser and Sender modules into the Processor module. Now the Parser and Sender modules only contain logic related to their actual names.

Now before we finish off by cleaning up the old modules, I have something to say about Elixir pipelines.

An Aside About Elixir Pipelines

If you’re familiar with Elixir you might be wondering why I didn’t construct a simple pipeline using the aforementioned pipe operator (|>), something like:

webhook
|> check_webhook_expired
|> send_webhook
|> handle_response
|> save_response_to_db

Those pipelines are great when you have a simple happy path that, given some initial state, modifies it through different stages and produces an output. But they’re not well-suited for pipelines like webhook processing where any stage could throw an error and bring the pipeline to a halt.

For example, when check_webhook_expired finds that the webhook has expired, all processing should stop and it should return an error. If you used the simple pipeline code above, you wouldn’t be able to stop the pipeline so all of the subsequent stages would have to handle this error case and skip over their own processing. This is quite messy.

Elixir’s with statement solves this problem cleanly. It uses pattern matching on the result returned from each pipeline stage such that if an error gets returned from any pipeline stage, all processing stops and the code moves outside the pipeline to the error handler defined in the else block.

def process(webhook = %WebhookState{}) do
  with(
    {:ok, webhook} <- check_webhook_expired(webhook),
    {:ok, webhook} <- send_webhook(webhook),
    {:ok, webhook} <- handle_response(webhook),
    {:ok} <- save_response_to_db(webhook)
  ) do
    Logger.info("webhook processed successfully :) !")
    :ok
  else
    {:error, msg} ->
      msg = "error processing webhook :( ! - #{msg}"
      Logger.error(msg)
      :error
  end
end

If you want to learn more, check out:

Elixir docs about the with statement
Opus – library that lets you define more complex pipelines with your own “pluggable business logic components”, we actually use Opus at PagerDuty for some of our services

Back to the New Code

Now the only thing left to do is remove all the business logic from the other modules.

Receiver still receives the webhook and parses it but instead of sending it directly to its endpoint it now passes the webhook to the pipeline defined in Processor.process:

defmodule Receiver do

  def receive(raw_webhook) do
    parsed_webhook = Parser.parse(raw_webhook)
    case parsed_webhook do
      {:ok, endpoint, payload, created_at} ->
        # create the initial webhook pipeline state and start the pipeline
        %WebhookState{
          endpoint: endpoint,
          payload: payload,
          created_at: created_at
        }
        |> Processor.process()
      {:error, error_msg} -> Logger.error("error - #{error_msg}")
    end
  end

end

Parser is now solely responsible for decoding the webhook from a JSON string:

defmodule Parser do

  def parse(raw_webhook) do
    decoded_webhook = JSON.decode(raw_webhook)
    endpoint = decoded_webhook.event.endpoint
    payload = decoded_webhook.event.payload
    created_at = decoded_webhook.event.created_at

    # no more expired webhook logic :) !

    {:ok, endpoint, payload, created_at}
  end

end

Sender is now solely responsible for sending the webhook to its endpoint:

defmodule Sender do

  def send(endpoint, payload) do
    HTTP.post(endpoint, payload)
  end

  # no more logic about handling different types of responses
  # and storing them to the db :) !

end

DB didn’t have any business logic in the first place so it remained unchanged:

defmodule DB do

  def save_successful_response(endpoint, payload, response) do
    MySQL.insert('successes', endpoint, payload, response)
  end

  def save_failed_response(endpoint, payload, response) do
    MySQL.insert('failures', endpoint, payload, response)
  end

end

If you want to see a diff of the Old Code and New Code on Github, check out this link. Here’s what the diff of the Parser module looks like:

Parser Diff

Once again, I wanted to lightly remind you that all of the code above is a simplification of our actual webhook processing pipeline. If I see any support requests (or even better, tweets) saying something like “Does PagerDuty seriously not even bother to retry sending failed webhooks???”, I will simply shake my head.

Did This Refactor Actually Help???

Yes, it did! But don’t take my word for it, take my team’s:

PR Comment

This brought a tear to my eye :’)!

Everyone now had a better understanding of this business logic which will enable us to make more informed, confident decisions about any changes we want to make to this business logic as part of our rewrite.

One story I’d like to share was during one of our team meetings where we were reviewing the code from my refactor PR. I was going over our equivalent of the Processor module which had a similar piece of logic around not sending out expired webhooks. Someone asked, “Wait, why are we doing that?”. And then silence … No one could give a good answer. This prompted an investigation into why indeed we were actually doing that. After a long journey into the deep, dark depths of Confluence (arguably scarier than Mordor), we uncovered an old doc that clearly explained why the code did that and it made perfect sense!

Recap

So this is how I refactored the business logic behind our webhook processing pipeline to be more functional.

Recall some of the problems with having your application logic spread out across different modules:

It takes a long time to move through all these different modules and understand what is actually going on
You end up with poor separation of concerns in your modules
It makes it hard to get a big picture view of your business logic

Recall some of the benefits of centralizing all this logic into one module:

You only have to look at one module to see all the business logic
You get better separation of concerns – Parser is only responsible for decoding webhooks, Sender is only responsible for sending webhooks to their endpoints
It takes advantage of functional programming concepts like pipelines
I didn’t mention this but it also makes it easier to unit test this logic

Pipeline

I would be remiss not to mention that there are some benefits to embedding your business logic within different modules of your app. This can lead to less code and faster development and is one of the fundamental ideas behind frameworks like Ruby On Rails that prefer “convention over configuration”. But even within the Rails community, there is an ongoing, passionate debate about whether you should centralize your business logic or keep it spread out over your application. Hopefully you will read this blog post and at least consider the idea of centralizing your business logic.

Big shout-out to Dave Thomas and his “Elixir for Programmers” course which inspired me to spend a whole day (and weekend) refactoring the business logic of my team’s code.

So have I inspired you to drop everything and get refactoring? Or do you vehemently disagree with keeping your business logic separate from everything else? All feedback and discussion is welcome in our PagerDuty Community forum!

If you’d be interested in working for a company that empowers its engineers to do things like refactoring code to make it more developer-friendly, check out our current job openings.

Incident Management

AIOps

Automation

Customer Service Ops

Status Pages

Stakeholders Communications

Integrations

PagerDuty Copilot

Developer Platform

Professional Services

Security

Enterprise Class

Integrations

Engineering Blog

How I Centralized Our Scattered Business Logic Into One Clear Pipeline For Our Elixir Webhook Service

The Original Code

The New Code

An Aside About Elixir Pipelines

Back to the New Code

Did This Refactor Actually Help???

Recap

You may also love these...

Updating Your Tools for API Scopes

Scrum Ceremonies: A Beginner’s Guide

GSD With Heart: How One Team Lives Their Values at PagerDuty