Continuous improvement is one of the fundamental tenets of Agile methodology that PagerDuty’s product development teams emphasize. This already works fairly well at the individual...by Simon Darken
August 15, 2019
When PagerDuty was founded, development speed was of the essence—so it should be no surprise when we reveal that Rails was the initial sole bit of technology we ran on. Soon, limitations caused the early team to look around and Scala was adopted to help us scale up.
However, there’s a huge gap between Scala and Ruby, including how the languages look and what the communities find important; in short, how everything feels. The Ruby community, and especially the Rails subcommunity, puts a huge value on the developer experience over pretty much anything else. Scala, on the other hand, has more academic and formalistic roots and tries to convince its users that correctness trumps, well, pretty much anything.
It nagged me and it looked like we had a gap in our tech stack that was hard to reconcile. Scala wasn’t likely to going to power our main site, and though we needed better performance and better scalability than Ruby and Rails could deliver, there was a very large gap and switching was cumbersome. There was also little love for the initial Scala codebase, as it quickly became apparent that writing clean and maintainable Scala code was hard and some technology choices made scaling more difficult than anticipated.
When thinking about these issues in 2015, I let myself be guided by an old Extreme Programming practice, that of the System Metaphor. Given that I did some work in the telecommunications world, it wasn’t too far-fetched to look at PagerDuty like some sort of advanced (telco) switch: stuff comes in, rules and logic route and change it, and stuff goes out. As with any analogy, you can stretch it as far as you want (look at an incident as an open circuit), but for me, just squinting a bit and realizing that it was a workable metaphor was enough.
If you say “telecom” and “programming language,” you say “Erlang.” I looked at Erlang before but the language never appealed to me, and in the past, it never really hit the sweet spot in terms of the kinds of systems I was working on. This time, however, the “sweet spot” part seemed to hold, but still—a language that was based on Prolog, that got a lot of things backward (like mixing up what’s uppercased and what’s lowercased), felt like C with header files … that would be a hard sell.
So I parked that idea, went back to work, and forgot about it until I stumbled upon Elixir a month or two later. Now this was interesting! Someone with Ruby/Rails street cred went off, built a language on top of Erlang’s Virtual Machine: One that was modern, fast, very high level, had Lisp/Scheme-style macros, and still professed to be nice to users (rather than Ph.D. students).
Needless to say, it immediately scored the top spot on my “languages to learn this year” list and I started working through documentation, example code, etc. What was promised, held up—it performed above expectations both in execution as well as in development speed. And what’s more, developing in Elixir was sheer fun.
After getting my feet wet and becoming more and more convinced that this could indeed remove the need for Scala, be nicer to Ruby users, and overall make us go faster while having more fun, I started looking for the dreaded Pilot Project.
Introducing languages is tricky. There’s a large cost involved with the introduction of new programming languages, although people rarely account for that. So you need to be sure. And the only way to be sure is, somewhat paradoxically, to get started and see what happens. As such, a pilot needs to have skin in the game, be mission-critical, and be complex yet limited enough that you can pull the plug and rewrite everything in one of your old languages.
For my team, the opportunity came when we wanted to have a “Rails-native” way to talk to Kafka transactionally. In order to ease the ramp-up for the MySQL-transaction-heavy Rails app, we wanted to build a system that basically scraped a database table for Kafka messages and sent it to Kafka. Though we knew this wasn’t really the best way to interact with Kafka, it did allow us to simply emit Kafka messages as part of the current ActiveRecord transaction. That made it really easy to enhance existing code with Kafka side effects and reason about what would happen on rollbacks, etc.
We opted for this project as our Elixir pilot—it was high value and the first planned Kafka messages would be on our critical notification path, but it was still very self-contained and would have been relatively easy to redo in Scala if the project failed. It doesn’t often happen that you stumble upon pretty much the perfect candidate for a new language pilot, but there it was. We jumped into the deep and came out several weeks later with a new service that required a lot of tuning and shaping, but Elixir never got in the way of that.
Needless to say, this didn’t sway the whole company to “Nice, let’s drop everything and rewrite all we have in Elixir,” but this was an important first step. We started evangelizing, helped incubate a couple more projects in other teams, and slowly grew the language’s footprint. We also tried to hire people who liked Elixir, hosted a lot of Toronto Elixir meetups to get in touch with the local community, and—slowly but steadily—team after team started adopting the language. Today, we have a good mix of teams that are fully on Elixir, teams that are ramping up, and teams who are still waiting for the first project that will allow them to say, “We’ll do this one in Elixir.”
Elixir has been mostly selling itself: it works, it sits on a rock-solid platform, code is very understandable as the community has a healthy aversion towards the sort of “magic” that makes Rails tick, and it’s quite simple to pick up as it has a small surface area. By now, it’s pretty unlikely that anything that flows through PagerDuty is not going to be touched by Elixir code. This year, we’re betting big on it by lifting some major pieces of functionality out of their legacy stacks and into Elixir, and there are zero doubts on whether Elixir will be able to handle the sort of traffic that PagerDuty is dealing with in 2018. From initial hunch through pilot into full-scale production, it’s been a pretty gentle ride and some of us are secretly hoping that one day, we’ll only have to deal with Elixir code.
An article on Elixir would not be complete without a shout out to the community’s leadership, starting with Elixir inventor José Valim, whom I largely credit for the language’s culture. Elixir comes with one of the nicest and most helpful communities around, and whether on Slack, on the Elixir Forum, or on other channels like Stack Overflow and IRC, people are polite, helpful, and cooperative. Debates are short and few, and the speed of progress is amazing. That alone makes Elixir worth a try.
Want to chat more about Elixir? PagerDuty engineers can usually be found at the San Francisco and Toronto meetups. And if you’re interested in joining PagerDuty, we’re always looking for great talent—check out our careers page for current open roles.