Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Check out the latest capabilities we released.
Flexible schedules, escalations, & alerting
Automated, best practice incident response
Powerful context & noise reduction at scale
Quantify real-time business & technical impact
Improve with modern, prescriptive insights
Over 300 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
We've created a maturity model to assist on the journey to digital operations excellence. Take our short assessment to find out where your team falls!
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
In part 2 of our postmortem series, we dig into how to establish a culture of continuous learning, from getting leadership on board to invoking...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
Achieving scale — that is, the ability to meet application demand at any level — is essential if you want your business and user base to grow, or if you hope to be able to handle the vicissitudes of modern software deployment.
Yet scaling is no easy feat. Most legacy applications struggle to support thousands of users. An unexpected traffic spike will simply knock over an application not designed for it, and countless customers and dollars are lost while the ITOps team struggles to spin up VMs or rack-and-stack servers to handle the load.
And even if you run your app in the cloud, scalability is not guaranteed. A poorly designed cloud app will experience bottlenecks that render it unusable.
Given the massive costs of suboptimal digital services on productivity, lost opportunities, and more, scalability is mission-critical to any organization today. And it is possible! It requires implementing the right tools and processes, the right team, and the right communication lines between that team. Below, I explain how to achieve scalability in order to avoid derailing your software and organization.
When preparing for scalability, flexibility and matching deployed infrastructure to meet the load is key. This can also drive cost efficiencies in deploying an application. Understanding your traffic patterns, average usage, and the standard deviation will help you properly size your environment, and planning to rapidly scale for an exceedingly rare (but possible) event can save a lot of headaches when the application goes viral. If an application is deployed regionally, often, idle cycles can be found during the middle of the night. Weekday load vs. weekend load can vary significantly. Many businesses are seasonal, and usage of the application is fractionally or exponentially lower from one time of year to the next.
Scale also involves ensuring the reproducibility of your artifacts, which in turn forces consistency in production deployments. The service artifacts can then be scaled independently as application needs change and grow. This method requires a strong understanding of DevOps, with a durable continuous integration and continuous deployment pipeline at its core.
First and foremost, application source code needs to be checked into a version control system. Instead of taking this well-structured output and building a bespoke server stack around it, the server stack itself also needs to be transformed into code. It can be a painful process at first, but the only way to scale an infrastructure consistently, every time, is to not rely on an ITOps staff member clicking the “next” button or typing commands into the console on every server deployed to dev, test, and production.
Once your infrastructure and code are both well-defined, you can write integration tests to ensure they function as they should in a fully built environment. To take these to the next level of sophistication, containers can be used as infrastructure building blocks. Those blocks then have consistent “downward” facing hooks to the infrastructure. A cloud container management platform, combined with manifest files that describe how the services fit together and should scale, turn these consistent artifacts into a highly resilient and scalable application.
The often missed essential ingredient for scalability is a team that maps well to the technological topology described above. Such a team includes three main groups (note: naming conventions for titles and division of responsibilities can vary across organizations):
The trick to coordinating your team in such a way as to maximize scale is to have SREs focus on reliability efforts by leveraging the Infrastructure-as-Code that their team members have written, rather than spending time on manual configuration. This makes for a different type of team arrangement than a legacy team structure, in which application code is simply “thrown over the wall” by developers to the ITOps team to deploy and run. The legacy model is a highly manual environment and is prone to error.
To complement this greater infrastructure visibility, engineering teams can implement a greater degree of application trace logging to help discover issues more quickly. As an incentive to create a more highly instrumented application, canary releases can be quickly deployed to a subset of the application’s user base, letting the team test new features and find bugs more quickly without affecting the larger application user base. Canary releases also let you gradually release new features, reducing the likelihood of incident spikes during rollout.
Last but not least, remember how important communication is. It should go without saying that even the best-structured team will not succeed in enabling scalability unless team members can communicate seamlessly with each other.
Effective communication requires not only tools that can automate communication tasks, but also a commitment to ensuring that everyone on your team “speaks the same language”— meaning that developers, ITOps, and SREs can all talk to one another in a mutually intelligible way because they all understand each other’s roles and needs.
It can be intimidating to take the first steps down a path of application scale. People, processes, and technology all need to change to move from a Waterfall method to DevOps, and to evolve legacy infrastructure management practices into modern ITOps and reliability engineering.
Much in the same way that the agile development revolution added value on a quicker timeline, each step in the scaling journey brings value that can be realized immediately.
This blog was co-authored by myself and Simon Darken. Once a year, PagerDuty’s SREs get together for a three-day, in-person offsite. With the team spread...
At the latest PagerDuty Connect event in Toronto, DevOps expert Arthur Maltson shared a recent story about chaperoning his daughter’s school field trip to a...
600 Townsend St., #200
San Francisco, CA 94103
905 King Street West, Suite 600
Toronto, ON, M6K 3G9, Canada
1416 NW 46th St., St. 301
Seattle, WA 98107
5 Martin Place
1 Fore St,
London EC2Y 9DT
© 2009 - 2019