As we construct larger or more complex systems, failure and change are ever-present. We need to accept and even embrace these tensions to build software that works and keeps working.This is a talk on building and operating reliable systems. We will look at how systems fail, particularly in the face of complexity or scale, and build up a set of principles and practices that will help us implement, understand and verify reliable systems.
This talk was presented at YOW! 2018.