February 10, 2015

Breaking Point: Building Scalable, Resilient APIs

The default position of a distributed system is failure. Networks fail. Machines fail. Systems fail.

The problem is APIs are, at their core, a complex distributed system. At some point in their lifetime, APIs will likely have to scale, maybe due to high-volume, large data-sets, a high-number of clients, or maybe just scale to rapid change. When this happens, we want our systems to bend not break.

This talk is a tour of how systems fail, combining analysis of how complex systems break at scale with anecdotes capturing the lighter side of catastrophic failure. We will then ground this with a set of practical tools and techniques to deal with building and testing complex systems for reliability.

This talk was presented at API Days Sydney, February 2015.

[ deck ]