Founded in 1995, Chicago Trading Company (CTC) is a derivatives trading firm that specializes in market trading across a variety of products, services, and strategies. CTC actively trades in a broad spectrum of asset classes, including equities, interest rates, and commodities. Their trading desks are open 20 hours a day, six days a week, and they are recognized as a leading provider of liquidity and pricing on numerous derivatives exchanges around the world.
Because the market fluctuates by the second, CTC’s critical applications and services need to be always online and available for users in a moment’s notice to deliver a consistent customer experience, every time. “With our services directly tied into the open market, downtime is just not an option,” explained Luke Rotta, Manager, SRE and Software Infrastructure at CTC. “If we’re not in the market, we’re not participating in the opportunity—and it’s a missed opportunity.” Rotta is responsible for managing the Software Infrastructure team that builds and delivers applications, as well as overseeing the SRE team that monitors CTC’s pre- and end-production environments.
Before implementing PagerDuty, Rotta’s team experienced several challenges, including:
With the recent push towards remote work, CTC was forced to quickly pivot their operations to a digital-first model. Additionally, heightened market volatility meant that their customers also increased the frequency of their trading, making it more important than ever that the CTC trading platform stayed up and running at all times.
To help achieve this, CTC needed to rethink their incident management process while continuing to maintain and deliver a consistent customer experience. This meant Rotta’s teams needed to refocus their efforts on day-to-day operations rather than long-term projects—and all in a new, remote-first environment. “Our teams are laser-focused on making sure systems can handle the increased capacity and deliver liquidity to the marketplace to keep our customers happy,” shared Rotta.
Before going remote, most information was communicated verbally in the office. Now, with the entire company now working remotely, the ability to effectively communicate and collaborate across teams is more important than ever. PagerDuty helped CTC transform their incident communication channels to be completely digital. “PagerDuty really taught us to spin up an incident remotely and allowed us to centralize our incident management process to quickly assemble teams into a single channel and make decisions directly from there.”
CTC also leverages Slack, part of PagerDuty’s ecosystem of over 350+ integrations, to improve incident communication and collaboration between teams, as well as for conducting postmortems. With the Slack integration, teams can create, respond, and resolve PagerDuty incidents directly inside the Slack interface, which alleviates the stress of multiple communication channels and allows all necessary teams to work the incident together. “Since all teams are remote now, we just create the incident directly in Slack, the playbook tells everybody what Zoom room to jump into, and off we go,” shared Rotta.
In a digital-first environment, it’s critical for stakeholders to have total visibility into the health of their critical systems and services in real time so they can quickly orchestrate a proper response when an incident happens.
Before PagerDuty, CTC used a traditional dashboard that would alert the team about service disruptions and incidents. “We would get what we call the ‘wall of red,’ which was quite literally a screen filled with hundreds of alerts, with no sense of what’s being impacted or what’s going on in our environment,” explained Rotta.
To combat this issue, CTC implemented PagerDuty Event Intelligence to automatically group alerts together and cut down the noise for all mission-critical services and applications. “Before PagerDuty, we sometimes had 50-200 alerts coming in at once. With Event Intelligence, that number is now down to 5-10,” explained Rotta.
With Event Intelligence, CTC’s response teams also have the context they need to quickly resolve an issue before it becomes widely customer-impacting. “The ability to reduce the noise and clear out alerts within the platform really frees up a lot of time for people on our SRE team to focus on higher-impact tasks,” said Rotta.
Since implementing PagerDuty, CTC has seen several benefits, including:
PagerDuty also helped support CTC’s business continuity strategy. “In this new, remote environment, employees can feel disconnected from what’s going on, and we’re trying to solve that with PagerDuty. Almost everyone at the company is on the PagerDuty platform, whether they’re a stakeholder or a full user,” shared Rotta.
CTC plans to continue expanding their use of PagerDuty across the organization. For example, the company has decided to focus more on metrics to inform future actions, so Rotta’s team is looking into Operational Reviews, as well as PagerDuty Analytics and Intelligent Dashboards, to help better understand team health, the business impact of incidents, and to measure SLAs and gain the ability to seamlessly share metrics with executive leadership. “This could help drive decisions around what applications we need to invest in,” explained Rotta.
Additionally, while CTC already has all of their major business services set up in Status Dashboards, they are looking to extend its use across the company by providing executive leadership improved visibility into the status of an incident or a service. As the PagerDuty platform grows with CTC, Rotta and his team look forward to extending the platform’s functionality across other parts of their infrastructure. “I like that it’s simple. I don’t have to manage anything because it just does its job,” he shared.
To learn how PagerDuty can help your team make things simple and transform operations in a digital-first world, contact your account manager or try a 14-day free trial today.