Boosting Operation Throughput In Redis

Ishay Wayner
Nov 28, 2023
3 min read

A customer approached us with a critical performance issue in their event-driven application, which was integrated with Redis. The application was struggling with a low event processing rate, handling only a few thousand events per second. This rate was significantly below the client's requirements, and the root cause of the bottleneck was unclear.

To address this issue, we performed a detailed analysis. Over several days, we collaborated closely with the customer to review their application's codebase and the specific Redis operations executed for each event. This deep dive involved scrutinizing various aspects of the application and Redis server interaction, focusing on identifying operations with unusually high execution times which were likely the root cause of the low ingestion rate of the application.

A crucial aspect of our analysis centered around understanding the inherent characteristics of Redis, particularly its single-threaded architecture. Redis operates on a single-thread model, meaning it processes one operation at a time per connection. While this design simplifies certain aspects of operation, it also means that long-running or complex commands can quickly become a bottleneck. In this specific case, certain operations executed by the application were unusually time-consuming, thus monopolizing the Redis server's processing capabilities. This led to a significant slowdown in handling other concurrent requests, creating a bottleneck that severely limited the overall throughput of events.

As part of our investigation we utilized the Redis slow log as well as the Redis latency doctor to reveal several key operations within Redis that were major contributors to the bottleneck. These operations, which had a time complexity of O(N) or greater, were consuming disproportionate amounts of processing time, severely limiting the overall throughput of the system. An example for one such operation is the SCAN operation which is used to iterate over keys matching a specified prefix with a configured batch size, in this case the customer specified a very large batch size in the SCAN operation which caused the SCAN operation to be relatively slow and create a bottleneck. To understand this better let’s use an example, the following SCAN operation performs a scan of all keys that start with foo with a batch size of 10000:

SCAN 0 MATCH foo* COUNT 10000

Let’s break this operation down: SCAN - the operation 0 - means we are starting a new cursor and not continuing following a previous execution of SCAN MATCH foo* - return keys starting with foo COUNT - return up to 10,000 keys in each iteration of the cursor

If there are only 10 keys in my keyspace that start with foo than this SCAN will only run once and have a time complexity of O(10) which is negligible, how ever if i have hundreds of thousands of keys that start with foo (or possibly even millions) than the time complexity of this operation would increase dramatically, each iteration of the cursor will have to examine 10000 keys, which might only take a few milliseconds but the compound effect of tens or hundreds of these scan operations can easily create a bottleneck in Redis, blocking other operations from executing and influencing the overall operation throughput that the application can achieve. To tackle the inefficiencies found in this case, we employed a multi-faceted optimization strategy. This included changes to the Redis configuration which made the purging of expired keys asynchronous, reducing the batch size of the SCAN operation as well as leveraging read replicas for particularly slow operations, effectively distributing the load and reducing the strain on the primary Redis server.

These changes had a profound impact on the system's performance. The event processing rate of the application surged, exceeding 10,000 events per second (corresponding to over 70,000 operations in Redis per second). This marked a significant improvement, vastly surpassing the initial processing capabilities and meeting the client's operational requirements.

If you find yourselves struggling with Redis, contact us and one of our database specialists will help you.

Comments