Real-Time Systems

WebSocket Scaling Patterns for 100K+ Concurrent Connections

Why WebSockets Are Hard at Scale WebSockets seem simple until you need to support hundreds of thousands of concurrent connections. The challenge is not the protocol itself — it is...

calendar_today May 12, 2026 schedule 2 min read

Why WebSockets Are Hard at Scale

WebSockets seem simple until you need to support hundreds of thousands of concurrent connections. The challenge is not the protocol itself — it is state management, connection routing, and message fan-out across a distributed system. Here is what I learned scaling real-time notification systems in production.

The Connection State Problem

Every WebSocket connection is stateful. In a horizontally scaled system, a user might connect to server A, but the message they need might arrive at server B. You need a way to route messages to the correct server holding that connection.

Pattern: Pub/Sub Connection Registry

The solution I use most often combines Redis pub/sub with a connection registry. Each server registers its connections in Redis with a TTL. When a message needs to reach a specific user, the system looks up which server holds that connection and publishes the message to that server channel.

// Server registers connection
await redis.set(`ws:conn:${userId}`, serverId, { EX: 3600 });

// Message routing
const targetServer = await redis.get(`ws:conn:${userId}`);
if (targetServer) {
  await redis.publish(`ws:server:${targetServer}`, JSON.stringify({
    userId, message, timestamp: Date.now()
  }));
}

Handling Connection Drops

Connections drop. Mobile networks switch between WiFi and cellular. Browsers go to sleep. Your system must handle this gracefully with exponential backoff reconnection, message buffering during offline periods, and sequence numbers to detect gaps.

Load Testing Reality

Load testing WebSockets is different from HTTP. You cannot just hammer endpoints — you need to maintain persistent connections and simulate realistic message patterns. I use k6 with custom WebSocket extensions, gradually ramping from 1K to 100K connections while monitoring memory, CPU, and message latency percentiles.

Key Takeaways

  • Use Redis pub/sub for cross-server message routing
  • Implement connection heartbeats every 30 seconds
  • Buffer messages during reconnection with sequence numbers
  • Monitor P99 message latency, not just averages
  • Design for graceful degradation when WebSocket falls back to HTTP polling
Written by

Senior Software Engineer specializing in cloud architecture, real-time systems, and enterprise-scale applications.

Share this article

Related Articles

Real-Time Dashboards with Materialized Views and Change Data Capture illustration

Real-Time Dashboards: MVs & CDC for Live Analytics

May 31, 2026 · 15 min read