Indonesia has 60 million Micro, Small or Medium Enterprises (MSMEs) including 3.5 million warungs, or stalls. These small family-owned businesses are more than places of commerce, but essential to daily life in the neighborhoods and communities they serve. Until recently, most of these mom and pop shops had simple bookkeeping methods, often using pen and paper for recordkeeping. This made it difficult to track finances, which could prevent them from gaining access to lines of credit. BukuWarung was founded in 2019 to solve these challenges, with an ultimate goal of digitizing small businesses in Indonesia.
Today, BukuWarung is the leading all-in-one financial app for MSMEs. Its vision is to accelerate the financial success of MSMEs in South-East Asia by building an operating system for merchants to facilitate the movement of money across the entire value chain.
Manish Jain, the Director of Engineering at BukuWarung, is responsible for building the payments product for MSMEs in Indonesia. The team maintains services such as cash loans, remittance, and QR code-based payments—all mandating round the clock monitoring of systems. It’s critical that these small businesses can access their digital bookkeeping and payments. Jain explained, “This involves the movement of money. A frictionless customer experience and the maintenance of SLAs are super important.”
The fintech’s platform depends on downstream partners, like banks and payment gateways, to process transactions. Any issues with partners’ systems can cause issues for BukuWarung’s customers.
The engineering team had anomaly detectors in place, however, notification channels—like email—were passive and didn’t attract immediate attention. They were largely dependent on merchant complaints to learn if systems weren’t working. Customer support received messages or even saw posts on social media that customers couldn’t complete a transaction.
When this happened, someone would post a message on Slack to figure out who was responsible for fixing the problem. This inefficient process was completely manual and involved too many people. First response time for systematic failures and large customer issues was about one hour. Worse, blackouts were occurring two times a month on average. This was not only disruptive for customers, but a challenging work environment for the engineering staff.
As the company quickly grew, so did the engineering team—scaling to a team of 100 between 2021 and 2022. The existing way of managing incidents was no longer an option. “We realized we cannot be scrambling for every incident and we cannot be dependent on offline systems,” said Jain. “We needed an incident management solution to resolve these systemic issues before they start impacting the merchant experience.”
Jain had used PagerDuty at a previous company, and knew its best-in-class capabilities were exactly what BukuWarung needed. The team immediately implemented an on-call roster to equitably distribute the workload and ensure 24×7 coverage. Escalation policies were put in place to prevent any incident from slipping through the cracks.
They integrated systems like lending, authentication services, and monitoring tool Datadog with PagerDuty, enabling effective incident response. Now, any partner system or systemic issues—such as unusually high latency or error rates—are sent to the right engineer to address in real time. PagerDuty has eliminated the need to constantly monitor systems, and they are no longer reliant on customers to flag problems. This new way of working had a positive impact on the team. Jain shared, “People are more at peace now that there’s a reliable system in place.”
The team also integrated PagerDuty with Slack, allowing responders to easily turn a Slack message into a PagerDuty incident with a slash command. If multiple services are impacted or responders from distributed teams are required to resolve an issue, the Slack integration makes it easy for the on-call engineer to create a conference bridge and pull in the right response team.
With PagerDuty, teams are mitigating customer impacting issues by immediately taking action when incidents happen. “We’ve seen our first response time come down to a max of 10 minutes. Sometimes as low as one minute,” said Jain. “Also, we don’t reach a blackout state. When something goes wrong, we’re able to catch it sooner. This is critical for our business.”
“We’ve seen our first response time come down to a max of 10 minutes. Sometimes as low as one minute.”
– Manish Jain, Director of Engineering, BukuWarung
Delivering on its mission to support small Indonesian businesses means that change is constant at BukuWarung. As the company rapidly grows, new launches and system updates can cause unplanned impacts. But, with PagerDuty, teams can quickly take corrective action, control unexpected situations, and ensure the platform is up and running for customers. “Those are the times when you need PagerDuty the most,” said Jain. PagerDuty helps:
“We must ensure 100% uptime of services so that payments are available for our merchants. I didn’t look any further than PagerDuty. It’s reliable, it’s the best, and it works,” said Jain.
PagerDuty’s analytics are helping BukuWarung improve engineering operations by providing key data and learnings. “In addition to resolving issues, PagerDuty helps with root cause analysis. It helps us understand the chronology of events so we can correlate it back with our monitoring systems. We can stitch everything together, improve, and position ourselves better for the future,” explained Jain.
As the organization grows, the team plans to expand their use of PagerDuty by integrating additional systems such as Jira to create automated incidents.