Earlier this month, PagerDuty attended and sponsored Accelerate Good Global (AGG), Fast Forward’s annual summit focused on spotlighting entrepreneurs, technology leaders, and philanthropists in the...by Joseph Mandros
April 19, 2019
My name is Yiyun and I’m currently a Computer Science student at the University of Waterloo. I’m a Software Engineer intern on the Core team here at PagerDuty. In this post, I would like to share some reflections on my experience over the past four months at PagerDuty.
My team maintains and develops several core libraries and services that are used by different teams so that the engineering organization at PagerDuty can move forward rapidly.
A regular day of work involved getting into the office in the morning, grabbing a cup of tea or coffee, attending our morning stand-up meeting, and then starting work on my task. There were also days where meetings or other special events interrupted this daily routine—but they turned out to be some of my most interesting and memorable experiences.
The company has a very strong customer-focused culture, and reliability is the core of everything we do. Failure Friday is an ongoing practice across the engineering organization. Sessions are set up to evaluate performance for new services by triggering unexpected failures such as isolating the network, aborting processes, and rebooting hosts. This helps teams uncover implementation issues and empowers them to be proactive in discovering deficiencies rather than waiting for them to be a root cause of a major outage.
When I joined the team in September, a new project had just kicked off, allowing me to closely follow the steps—from design and development to deployment—that the team took to build a new service. I also had the opportunity to take responsibility for some of the project’s major milestones, including planning a Failure Friday for the new service.
This required me to consider different scenarios where the service may behave differently. Since a Failure Friday also involves the participation of an incident commander and engineers from other teams, I also had to coordinate bringing people together from across the engineering organization. Seeing the service we had built undergo grueling tests at Failure Friday is akin to a parent seeing their child go out of their protection and start taking on challenges themselves.
I also had the opportunity to join a two-day offsite with my team. We spent the first day throwing axes and visiting an art gallery. The second day was a retrospective session where we reflected on the team’s performance over the past half-year, and set initiatives and goals for the team for the next half. Although I had only been on the team for two months at the time, I was encouraged to share my feedback on the team’s decisions. What I learned is that rather than thinking of yourself as an intern, having confidence is the key to making meaningful contributions.
On-call was a very new experience to me. Interns are not required to join their team’s on-call rotation, but they are highly encouraged to do so. Although I was initially fearful of putting myself in such a situation, my curiosity led me to shadow my team’s on-call for a week. It was comforting to know that my team was there to back me up.
I still remember my heart flutter when I saw the topic on my team’s channel as “Yiyun Liang is on-call for Core.” As luck would have it, I was woken up on the very first night when one of our services started to behave oddly during a network blip. I received a message from another team almost right away since their service depends on our service. After some investigation, I was able to confidently tell the other team that the service had recovered from the incident. Being on-call for your team does not only mean you need to fix problems, but you are also the person responsible for any questions other teams may have regarding services that your team owns.
I also had the opportunity to witness major incidents and how we were able stay our cool and resolve them as quickly as possible. Much like other interns in the past, I was amazed how well PagerDuty implements best practices when it comes to incident response and resolution.
Interns at PagerDuty are given full trust and opportunities to contribute to production code on a daily basis. I can’t believe how much I have learned over these past four months—my time with PagerDuty taught me to stay strong even when unexpected challenges occur. I feel really lucky to be able to join such a great company, and become a part of this incredible team. What’s also very exciting about PagerDuty is the rate at which it’s growing. Now is an awesome time to be a Dutonium.