Back to Blog
February 10, 20252 min readKevin Lam

Achieving 99.99% Uptime Through a Customer-Driven Reliability Program

Customer SuccessReliabilityUptimeProduct ImprovementCustomer Voice

The Challenge

Platform reliability was the number one complaint in our NPS feedback. While our published SLA was 99.9%, actual uptime was averaging 99.5%, with monthly maintenance windows and occasional unplanned outages. For customers in healthcare and government where authentication outages meant locked-out workers, even 99.9% was insufficient. Three accounts cited reliability as their reason for churning.

The Approach

I collected detailed outage impact data from 20 customers, documenting the business cost of each minute of downtime in their specific environment. A hospital losing authentication for 30 minutes could not access patient records; a defense contractor losing authentication could not access classified systems. I presented this data to our engineering leadership with a clear message: reliability was a revenue problem, not just a technical one.

I then facilitated a customer reliability advisory group — five customers who worked directly with our engineering team to prioritize reliability improvements. The group identified three high-impact changes: eliminating scheduled maintenance windows through rolling updates, deploying multi-region failover, and implementing real-time customer-facing status pages with proactive alert notifications.

The Result

Over six months, platform uptime improved from 99.5% to 99.99%. Reliability-related churn dropped to zero. The five advisory group customers all expanded their deployments, and "reliability" shifted from the number one NPS complaint to the number two NPS strength. The program demonstrated that customer success can influence product priorities by translating customer feedback into business impact data.

Key Takeaway

Customer success teams sit at the intersection of customer needs and product capabilities. Translating customer complaints into business impact data that engineering leaders can act on is one of the highest-value activities in customer success. The reliability advisory group model works because it gives engineers direct access to the impact of their work.

Get new posts in your inbox

No noise. Tactical field notes when something worth sharing comes up.