← Back to Notes

f4: Facebook's Warm BLOB Storage System

2026-02-07whitepaper
Originally Published ↗Download PDF ⬇

f4: Facebook's Warm BLOB Storage System

Facebook's f4 is a specialized storage system designed to handle "warm" BLOBs (Binary Large OBjects)—data like photos and videos that are accessed frequently when new but become less active over time. As user-generated content grows exponentially, storing everything in high-performance "hot" storage systems like Haystack (which uses triple replication) becomes prohibitively expensive. f4 addresses this by moving aged data to a more storage-efficient system that sacrifices some performance for significant cost savings, without compromising data durability or availability.

The core innovation of f4 is its use of Reed-Solomon erasure coding to reduce the effective replication factor from 3.6x (in Haystack) to roughly 2.1x or 2.8x, depending on the configuration. This means f4 can store the same amount of data using significantly fewer disks. To achieve this, f4 is architected as a set of independent "cells," each containing enough racks and storage nodes to survive disk, host, and even rack failures. Data is logically organized into volumes, which are locked (read-only) once they are full and moved from Haystack to f4.

f4 separates the control plane from the data plane. The data plane handles the actual reading and writing of data blocks and parity blocks across storage nodes, while the control plane manages volume mapping, data placement, and recovery operations. The system is designed to be resilient to varied failure modes, including transient network issues and permanent hardware failures. By offloading warm data to f4, Facebook frees up capacity in its high-performance Haystack clusters for hot, incoming data, creating a tiered storage architecture that scales efficiently with their massive data growth.

Key Concepts

  • Warm Data: Data that is accessed less frequently than "hot" newly created data but must still be available with low latency. In Facebook's case, older photos and videos transition from hot to warm after a few months.
  • Erasure Coding: A method of data protection in which data is broken into fragments, expanded and encoded with redundant data pieces using Reed-Solomon codes, allowing the original data to be recovered even if some parts are lost. f4 uses this to save space compared to simple replication.
  • Effective Replication Factor: The ratio of total stored data (including redundancy) to actual user data. f4 reduces this factor significantly compared to triple replication (3x), leading to large storage savings.
  • Cell Architecture: f4 is deployed in units called "cells," which are self-contained collections of racks and servers. This modular design isolates failures and simplifies scaling and management.
  • Haystack vs. f4: Haystack is Facebook's hot storage system optimized for IOPS and write performance using replication. f4 is the warm storage layer optimized for storage efficiency using erasure coding.
  • Volume Locking: Volumes in f4 are read-only (immutable). Data is only moved to f4 after the volume in Haystack fills up and becomes immutable, simplifying the consistency model in f4.