Building and Operating a Pretty Big Storage System Called S3 (Blog)

This article is a guest post by Andy Warfield, VP and Distinguished Engineer at Amazon S3, based on his keynote at USENIX FAST '23. Warfield shares insights from six years working on S3, describing how the experience forced him to think about computer systems in broader terms than ever before—from hard disk mechanics and firmware at one end to customer-facing APIs and performance at the other.

S3 is effectively a living, breathing organism rather than just software. The system encompasses not just hundreds of microservices but also all the people who design, build, deploy, and operate that code. Warfield emphasizes that "the system" includes software, hardware, and people all evolving together continuously. This perspective shift was crucial for understanding how to operate a service where customers aren't buying software but buying a continuously available, predictably fantastic experience.

One of the most fascinating technical challenges discussed is "heat management"—balancing I/O demand across millions of hard drives. As HDD capacity has grown 7.2 million times since the first commercial drive in 1956 while seek times have only improved 150x, the tension between capacity and performance drives S3's architecture. The article describes how aggregating millions of workloads creates a smoothing effect where individual request bursts become decorrelated, allowing the system to handle enormous peak loads that would be impossible for individual customers to build themselves.

Beyond technical scale, Warfield explores human and organizational factors. He describes "durability reviews" as a critical process where engineers think creatively about what could go wrong with changes affecting data durability. The team also adopted lightweight formal verification using Rust for their ShardStore storage layer, creating executable specifications that are 1% the size of the real system but allow testing at impractical speeds. These approaches demonstrate how treating the organization as part of the system enables innovation in how teams build and operate.

The article concludes with reflections on "ownership" as a core Amazon principle. Warfield discovered that success at Amazon scale meant focusing on articulating problems rather than dispensing solutions, and helping engineers truly own their work. This lesson, learned through both successes and failures in academia and startups, proved essential for scaling himself as an engineer by making other engineers and teams successful.

Key Concepts

Heat Management: The challenge of balancing I/O demand across millions of hard drives to prevent hotspots and tail latency in large-scale storage systems.
Workload Aggregation: The phenomenon where aggregating millions of bursty workloads creates smooth, predictable aggregate demand that no individual workload can significantly influence.
Erasure Coding: A data protection technique (like Reed-Solomon) that splits objects into identity and parity shards, reducing capacity overhead while surviving failures.
Durability Reviews: A human process where engineers model threats to data durability and evaluate countermeasures, encouraging creative critical thinking about risks.
Lightweight Formal Verification: Using simplified executable models (1% the size of real systems) and property-based testing to verify storage system correctness.
ShardStore: The rewritten bottom layer of S3's storage stack, implemented in Rust with type safety extended to on-disk structures.
Ownership Culture: The Amazon principle where teams own their services completely—from API contracts to 3 AM incident response—enabling both accountability and empowerment.
Scale-Induced Smoothing: At sufficient scale, individual workload bursts become decorrelated and aggregate demand becomes highly predictable.
Living System: The concept that S3 is not just software but a continuously evolving ecosystem of software, hardware, people, and customer code.
Hard Drive Physics: Understanding that HDD capacity grows exponentially while seek times remain relatively flat, fundamentally shaping storage system design.