Flash Sale System Design: Achieving "Real-Time" Inventory Without Crashing the Database
Product Decode
•
The "Bottleneck" Problem of Flash Sales
Imagine an 11/11 Mega Sale campaign. You have 1,000 iPhones at a shocking price and 2 million users simultaneously hitting the "Buy Now" button at second zero. If your system is designed the traditional way—receiving a request, checking the quantity in the Database (RDBMS), decrementing by 1, and saving—your system will crash in the very first second.
Why Traditional Databases Fail
RDBMS (like MySQL or PostgreSQL) are designed for data integrity (ACID), not for massive simultaneous access (high concurrency) to the same record.
When millions of requests attempt to update the same row (the iPhone's inventory), the Database must use a Row-level Lock. Subsequent requests must queue up and wait for the previous ones to complete. This leads to Connection Pool exhaustion, skyrocketing latency, and ultimately a domino effect that crashes the entire system (Cascading Failure). Furthermore, if handled poorly, you will encounter Overselling (selling more items than are actually in stock).
Flash Sale System Design: Achieving "Real-Time" Inventory Without Crashing the Database | Product Decode
Design Principle: In Flash Sale events, the Relational Database (RDBMS) only serves as the "Source of Truth" in the long run; it must absolutely not be used as the "main battlefield" for processing direct transaction flows.
Cache-First Architecture: Turning Redis into the "Main Battlefield"
To solve this problem, large systems like Shopee or Amazon do not read/write directly to the DB. They push the entire inventory processing flow to an In-Memory Cache, most typically Redis.
Note: The flow below has been simplified by omitting some edge cases for easier tracking. Advanced issues like error handling, idempotency, and failure recovery are discussed at the end.
1. Pre-warming
Before the event starts (e.g., at 23:55), the inventory count (1,000 units) is pre-loaded from the Database to Redis. From this point on, every "How many items left?" query from the frontend will only read directly from Redis. Since Redis stores data in RAM, response times are measured in milliseconds (ms), easily handling millions of reads.
2. Inventory Deduction via Atomic Operations
When a user clicks "Buy," how do we prevent overselling? We use Redis's Atomic Operation feature.
The DECR (Decrement) command in Redis is single-threaded and atomic. This means at any given moment, only one request is allowed to subtract one unit.
Request 1001 arrives: Redis returns -1. (Rejected, notify out of stock)
By doing this, inventory remains absolutely accurate on the Cache without requiring any complex Locks from the Database.
3. Synchronization via Message Queue (Eventual Consistency)
Once Redis has successfully deducted stock (user successfully secured the item), how do we save this result safely to the Database for payment and shipping processing? This is where a Message Queue (like Kafka or RabbitMQ) comes in.
Instead of forcing the Database to write immediately, the system pushes a "message" (Message: User A got an iPhone) into the Queue. The Queue acts as a Shock Absorber:
Receives hundreds of thousands of messages from Redis per second.
Workers at the output (Consumers) pull messages from the Queue at a steady pace that the Database can handle (e.g., 1,000 writes/second).
The Database gradually updates its figures.
Trade-off Mindset: We trade Strict Consistency (real-time absolute consistency in the DB) for High Availability and Eventual Consistency. During the few minutes of the Flash Sale, the DB might show 1,000 units remaining while Redis shows 0. The Frontend always displays the number from Redis.
User Experience: The Illusion of "Real-time"
In technical reality, displaying a real-time "Only X items left" countdown on the screens of 2 million people simultaneously is incredibly resource-intensive (e.g., opening 2 million WebSocket connections).
Product/Tech Solutions:
Rate Limiting & Traffic Shaping: Block 90% of bot/junk traffic right at the API Gateway/CDN. Only allow 10% of genuine requests into the core system.
Client-side Throttling: The "Buy Now" button is grayed out (disabled) for 2-3 seconds after each click to prevent user spamming.
Controlled Polling: The user's app automatically sends a request to ask for inventory every 3-5 seconds instead of maintaining a continuous connection. When the quantity is < 10, it might switch to WebSockets or increase polling frequency to build excitement.
Designing a Flash Sale system is not just about optimizing code; it is the art of managing user expectations and protecting the weakest points (the Database) in the infrastructure.
Advanced Deep Dive: Omitted Edge Cases
The section above presents the "happy path." In reality, there are four critical issues an interviewer will ask about:
1. Simple DECR can still cause Oversell at the application layer
If the "check inventory → deduct stock" logic resides in the application code (not running directly on Redis), race conditions between app instances still exist. The standard solution is using Lua Scripts—bundling the entire check-and-decrement logic into an atomic unit that runs on the Redis server and cannot be interrupted. Additionally, if DECR returns -1 without using Lua, you must immediately INCR to rollback—an extra round-trip that is prone to bugs.
2. IP-based Rate Limiting harms legitimate users
Blocking by IP will unfairly block an entire office of 500 people sharing the same NAT. A better solution is a Virtual Waiting Room—instead of an outright rejection, the system places users in a queue showing their position and estimated time, "dripping" users into the core system at a speed the backend can handle. This protects the backend better without breaking the UX.
3. Consumer Worker Failure → Need for Idempotency Key and Dead Letter Queue
If a Worker crashes after Redis has deducted stock but before writing to the DB, the Queue will retry. If the retry processes a partially successful message again, a user might be charged twice. An Idempotency Key - a UUID attached to each message—helps the Worker check "has this been processed?" before writing, ensuring that no matter how many retries occur, only one order is created. For messages that fail repeatedly after N retries (due to inherent errors, not temporary crashes), they must be moved to a Dead Letter Queue to avoid blocking the main flow and to be kept for investigation.
4. Inventory not synced back on cancellations
The article didn't mention that after a Flash Sale, if an order is canceled, inventory needs to be restored to Redis. If the Redis key has expired or the value isn't updated correctly, subsequent orders (re-stock, second chance sales) will show incorrect data. A dedicated flow for cancel/refund is required to ensure Redis and the DB stay in sync.
Case StudyApr 11, 2026
Slack's "Green Dot" Problem: The Nightmare of Real-Time Architecture
Displaying an online status seems basic, but it becomes an infrastructure nightmare at scale. Discover how Slack restructured its Pub/Sub model to balance real-time accuracy with server costs.