← Back to Notes

Zuul 2: The Netflix Journey to Asynchronous, Non-Blocking Systems

2026-01-26blog
Originally Published ↗Download PDF ⬇

Zuul 2: The Netflix Journey to Asynchronous, Non-Blocking Systems

Netflix recently architected a major change to Zuul, their cloud gateway, effectively moving from a blocking, multithreaded system to an asynchronous, non-blocking framework using Netty. The primary driver for this shift was to enable persistent connections for millions of devices and web browsers, a capability essential for supporting features like push notifications and bi-directional communication. While the original Zuul (built on the Servlet framework) handled requests using a one-thread-per-connection model, this approach struggled under high concurrency and latency spikes. Zuul 2's async model, conversely, uses one thread per CPU core to handle all requests via events and callbacks, significantly reducing the cost of maintaining open connections.

The transition to an asynchronous architecture was not without its challenges. The team faced significant hurdles in debugging and testing, as async systems often produce meaningless stack traces and make it difficult to follow the lifecycle of a request. Furthermore, the existing Netflix ecosystem and core libraries were largely built with blocking assumptions, requiring extensive refactoring to remove thread-local variables and convert blocking network logic. To manage this complexity, Netflix introduced async Zuul Filters, allowing them to share business logic between both blocking and non-blocking architectures during the migration.

Ultimately, the move to Zuul 2 achieved the goal of connection scaling, enabling new product innovations and reducing cloud costs by replacing "chatty" device protocols. However, the team observed that for CPU-bound tasks, such as those involving heavy encryption or compression (like their API service), the efficiency gains of the async model were minimal compared to the blocking model. The most significant efficiency improvements were seen in IO-bound, write-heavy clusters like logging. The journey highlights that while async systems offer powerful scaling capabilities, they introduce operational complexity that must be carefully weighed against the specific workload characteristics.

Key Concepts

  • Blocking vs. Non-Blocking: Blocking systems use one thread per connection, which can lead to thread exhaustion under load. Non-blocking systems use event loops and callbacks, allowing a few threads to handle many connections efficiently.
  • Connection Scaling: The primary benefit of Zuul 2's async architecture is the ability to maintain millions of persistent connections at a low cost, enabling features like push notifications.
  • Async Complexity: Asynchronous systems are inherently harder to debug and operate due to the loss of clear stack traces and the complexity of managing state without thread-local variables.
  • CPU vs. IO Bound: The efficiency gains of async architectures are most pronounced in IO-bound workloads. For CPU-bound workloads, the performance difference between blocking and non-blocking models is often negligible.
  • Zuul Filters: Netflix refactored their filter logic to be asynchronous, allowing the same business logic to run on both the legacy blocking system and the new non-blocking Zuul 2 during the transition.