Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Connect insights to real-time action by aligning teams through the shared language of business impact.
Check out the latest products we’ve been working on—including event intelligence, machine learning, response automation, on-call, analytics, operations health management, integrations, and more.
Digital Operations Management arms organizations with the insights needed to turn data into opportunity across every operational use case, from DevOps, ITOps, Security, Support, and beyond.
Over 300 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
We've created a maturity model to assist on the journey to digital operations excellence. Take our short assessment to find out where your team falls!
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
In a world where everything comes down to moments of truth, teams must respond to issues and opportunities in seconds. Rising customer expectations demand real-time...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
Three years ago, I joined an up-and-coming startup called PagerDuty as an Engineering Manager. Having received Series A funding in 2013, the company was in hyper-growth mode and hiring aggressively across all areas. The Engineering team was less than 30 people at the time, and a big draw for me was (and still is) learning about the structural challenges facing a rapid-growth organization. As we approach 100 Dutonians in Engineering, it seems appropriate to look back and reflect on the changes so far.
Evolution of organizational structures is a fascinating process—full of mistakes, dead ends, reinvented wheels, and hopefully, lessons learned. There’s plenty of literature covering engineering organizations, much of it boiling down to either abstract lists of pros and cons or to blog posts that sing praises of whatever process the company happened to adopt at the time of writing. I want to take a different approach and examine the historical reasons that pushed our engineering team to continuously experiment with and evolve our structure, with all of the accompanying missteps and learnings.
Note: Some of the events and details have been simplified for the sake of the narrative—many of these are worthy of posts of their own. Bookmark this blog to keep an eye out for new content.
As Engineering grew from a small handful of people to several, distributed teams in very short order, the product development process remained rather immature. Despite superficial trappings of agility with stand-ups and sprints, we were essentially using the Waterfall model. Leadership defined the projects to work on, assigned the individual engineers to those projects, and set the target delivery dates. Predictably, the dates were rarely hit and the project-tracking spreadsheets were in a perpetual state of disrepair and, eventually, disuse.
Things didn’t look much better on the ground. Product managers kicked off new projects by drafting lengthy functional design specifications, doing their best to anticipate any possible questions that might arise—a consequence of Product and Engineering not actually interacting very much during development! The spec was presented to the Web Application and Backend Services teams, who then worked on the user-facing and backend aspects independently. New infrastructure requests had to be made to the Operations team weeks in advance.
Integrating all of these separate efforts into a coherent feature release was a nightmare. We had missing or incomplete infrastructure, hot potato bugs, functionality gaps (each team thinks the other is taking care of it), lack of ownership or empowerment for both engineers and product folks, missed dates, and organizational silos. The desire to hit dates caused us to take fewer risks, and we became more conservative in our implementations as well as averse to adjusting the product specs.
This combination of departmental structure and development processes pushed the topic of siloing to the forefront of every other discussion within Engineering that year. And boy, did we have silos:
The rigidly defined sets of responsibilities did not leave teams a whole lot of room for cross-functional collaboration. We experimented briefly with a concept of “working groups”—small, temporary, diverse squads drawing folks from the existing teams with the intent of working on scoped, timeboxed, cross-cutting projects. These ended up introducing more chaos by destabilizing the primary teams, and so the experiment was abandoned.
Still, the need to collaborate and deliver with better consistency and predictability was of utmost importance. Everyone was sick of siloing, so that had to go. We had our work cut out for us.
By early 2015, dissatisfaction with the state of things as well as rising interest in Agile methodologies reached critical mass and resulted in a departmental shake-up. Several new teams were formed that aligned around specific product directions. They followed the Scrum process and consisted of engineers from both Web Application and Backend Services teams, as well as product owners, Scrum Masters, and UX designers.
Given the less than ideal pace of progress made on the product in the prior year, the new teams were optimized for product delivery. To ensure that they could spend 100 percent of their time on cranking out new features, maintenance work was delegated to the Backend Services teams. These became known as “base teams” as they adopted Kanban methodology and took on ownership for all of the services in production, including all product on-call responsibilities. Furthermore, base teams fed into the product teams—engineers migrated from one to the other as product work ramped up.
These were obviously big changes. In an effort to minimize the impact of team shuffling on individual engineers (and to postpone having to deal with remote reports), the reporting relationships were not touched. This of course complicated everyone’s life tremendously, because now we had a dual reporting structure, aka the matrix organization! Many engineers ended up on teams that were not assigned to their direct supervisor and managers were now playing the roles of “people manager” (responsible for people, some of whom were on other teams) and “functional manager” (responsible for teams, where some engineers reported elsewhere).
The good news was that the old silos were mostly smashed, never to be seen again. Web Application Engineers working in close proximity with Backend Services Engineers were able to see past each other’s differences and move towards shared goals. Most teams were now geographically distributed as well, which worked wonders in bringing the two offices together.
All things considered, we were still in a much better shape than a year ago. But there was still a long way to go.
After living with the new organizational structure for a year, we accumulated quite a lot of learnings. Agile was good (but we did not do enough of it), DevOps was good (but we did not do enough of it), matrix management was not so good (and we’d had enough of it).
In early 2016 we restructured once again. “Vertical” product Scrum teams were now responsible for particular slices of the product. “Horizontal” teams were responsible for cross-cutting product or infrastructure concerns, and were tasked with setting best practices and enabling other teams to move fast. Product delivery teams owned their roadmap, requirement definition, implementation, deployment, and maintenance of their code and infrastructure (!) in production. We had adopted a true DevOps “you code it, you own it” approach.
The subject of GSD is worth digging into a bit more. Through practice, we got more and more comfortable with the ideas of team autonomy, innovation over invention, and business bringing problems to solve rather than solutions to implement. It’s not easy to let go of the idea that we know what’s best for the customers. A laser focus on delivering the minimum viable product, and seeking immediate feedback and incorporating it in during the development cycle, was key. It allowed us to iterate quickly, maximize value, minimize gold plating, and shorten the time to get a prototype into the customer hands from months to days or weeks. In other words, we transformed from an organization that “follows agile processes” to an actual agile organization.
As the number of product delivery teams grew, an interesting phenomenon developed—we saw an emergence of the so-called “tribes” (hat tip to Spotify’s team model); i.e., groups of teams related to each other by common functionality or common mission. These arrangements have resulted in benefits such as shared ownership, shared knowledge, shared roadmap (separate from individual team backlogs), and shared vision. Tribal organization is something we are still experimenting with—stay tuned for future updates on our learnings with tribes.
We’ve been running with this structure for 16 months now and it just works. Certainly teams and associated ownership lines will evolve as the company continues to grow at a fast pace, and some details will change as we continue to invest in improving ourselves. At the same time, it’s clear that we have done enough of the wrong things to learn what the right thing should look like.
As I think back to some of the decisions that were made early on and some of the awkward intermediate states that our organization has endured, I’m tempted to smack myself in the head and ask why it didn’t occur to us to jump to the obviously superior end state. The reality, of course, is never that simple—those decisions were a function of our state, our priorities, our people, and our challenges at the time. You may have recognized some of your own challenges as well, in which case I hope you’re able to take something away from our experience.
As organizations grow and mature, there is a tendency to slow down, calcify, and become more conservative. Through practicing continuous improvement and evolving our practices, we’ve managed to buck the trend and grow more agile, pragmatic, and productive with time — and you can do the same.
Here’s to three (and many) more years of learning!
This blog was co-authored by myself and Simon Darken. Once a year, PagerDuty’s SREs get together for a three-day, in-person offsite. With the team spread...
In the United States, it’s almost that time of year again where we count our blessings and give thanks. For retail workers, it’s also that...
600 Townsend St., #200
San Francisco, CA 94103
905 King Street West, Suite 600
Toronto, ON, M6K 3G9, Canada
1416 NW 46th St., St. 301
Seattle, WA 98107
5 Martin Place
1 Fore St,
London EC2Y 9DT
© 2009 - 2018