nginx websocket proxy_read_timeout docker-compose

The first WebSocket I shipped through nginx died on me at exactly 60 seconds, and I couldn't tell you why for an entire evening. Clean 1006 in the browser. Silent FastAPI logs. nginx's access log noted the request ending and walked away whistling. I burned hours combing through application code before realizing the proxy itself had been politely strangling the socket the whole time. That evening is why this walkthrough exists — four small commits in vytharion/nginx-websocket-proxy-read-timeout-docker, each one a lesson, that turn a broken deployment into a working one.

Lesson 1: reproduce the silent 60-second drop

A friend once asked me why I always start by reproducing a bug I already know exists. Because the first hour of debugging a silent disconnect is spent doubting the bug exists at all — and a stopwatch sitting next to a known-broken setup ends that argument fast. Three pieces are in play: a FastAPI app that exposes a /ws echo endpoint, a Dockerfile to package it, and a docker-compose service definition that puts nginx in front. Commit at b55c693 lands all of it.

The app is deliberately minimal so any bug end-to-end is provably the proxy's fault and not the framework's.

# app/server.py
import asyncio
import logging
from datetime import datetime, timezone
from fastapi import FastAPI, WebSocket, WebSocketDisconnect

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
log = logging.getLogger("ws-app")

app = FastAPI()

@app.get("/healthz")
def healthz() -> dict[str, str]:
    return {"status": "ok"}

@app.websocket("/ws")
async def echo(ws: WebSocket) -> None:
    await ws.accept()
    log.info("client connected")
    try:
        while True:
            msg = await ws.receive_text()
            ts = datetime.now(timezone.utc).isoformat(timespec="seconds")
            await ws.send_text(f"[{ts}] echo: {msg}")
    except WebSocketDisconnect:
        log.info("client disconnected cleanly")

The nginx config is the naive baseline that every blog tutorial on docker-compose reverse proxies seems to ship. It works perfectly for HTTP and fails silently for WebSockets.

# nginx/nginx.conf (lesson 1 baseline)
server {
    listen       80;
    server_name  _;

    location / {
        proxy_pass http://app:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Bring it up with [docker compose](https://odoo.nicedx.com/odoo-cai-dat-docker-ubuntu/) up --build, then open a quick interactive client in another terminal.

pip install websockets
python3 -c "
import asyncio, websockets, time
async def main():
    start = time.monotonic()
    async with websockets.connect('ws://localhost:8080/ws') as ws:
        while True:
            await asyncio.sleep(90)
            await ws.send('still here')
            print(time.monotonic() - start, await ws.recv())
asyncio.run(main())
"

The interactive client connects, waits 90 seconds, tries to send, and crashes with ConnectionClosedError 1006. Uptime printed at the moment of death sits stubbornly at roughly 60.3 seconds. That is the default proxy_read_timeout firing. The connection technically completed an HTTP/1.0 request, the Upgrade got stripped, the framework either rejected the handshake or accepted a half-broken socket, and nginx happily killed it after one idle minute.

Why 60 seconds, exactly? The default ships from upstream and is documented in the official ngx_http_proxy_module reference. nginx assumes proxied requests are HTTP, where the upstream should always be producing a response body within seconds. Long-lived sockets violate that assumption. Without overrides, nginx applies the HTTP-shaped ceiling.

Lesson 2: Upgrade and Connection headers

Why does cranking proxy_read_timeout up to an hour leave the connection just as dead at minute one? That is the wrong first move. Even with an infinite timeout, the connection above never carried a real WebSocket session, because nginx by default proxies as HTTP/1.0 and strips hop-by-hop headers like Upgrade and Connection. RFC 6455 (the WebSocket Protocol RFC) requires both headers to survive end-to-end so the server can complete the handshake. Without them the framework returns a 400 or, worse, accepts the request as plain HTTP and emits nothing the browser recognizes as a WebSocket frame.

The canonical fix for that one piece comes from the official nginx WebSocket proxying guide. Commit at 609e61b adds it.

map $http_upgrade $connection_upgrade {
    default upgrade;
    ''      close;
}

server {
    listen       80;
    server_name  _;

    location / {
        proxy_pass http://app:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;

        proxy_http_version 1.1;
        proxy_set_header Upgrade    $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
    }
}

Three things are happening. proxy_http_version 1.1 switches nginx out of HTTP/1.0 for upstream calls so the Connection header is even allowed. The map block translates the incoming Upgrade: websocket header to a matching Connection: upgrade value; for non-WS requests it sets Connection: close so HTTP keepalive semantics remain correct. The two proxy_set_header lines forward the values to the upstream so FastAPI completes the handshake.

Run the same 90-second client again. Now the WebSocket actually opens correctly. You can see the 101 Switching Protocols exchange in the nginx access log. But the script still dies at almost exactly the same uptime as before, around 60 seconds. The protocol is right; the timeout is still wrong. That distinction is worth internalizing: the protocol layer fix and the timeout fix are independent. Most articles conflate them into one config blob, which is why teams who copy and paste a working block from somewhere else cannot diagnose what to remove if half of it conflicts with their TLS config or their Host header rules.

Lesson 3: raise proxy_read_timeout to 3600 seconds

The handshake is clean now, the protocol is right, and the WebSocket still dies — same 60-second uptime as in lesson 1, completely different reason this time. This lesson is about that second 60-second wall. Commit at 27751a2 lands the actual fix and is by itself the smallest possible diff that turns a broken WebSocket deployment into a working one if the upgrade headers are already correct.

location / {
    proxy_pass http://app:8000;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;

    proxy_http_version 1.1;
    proxy_set_header Upgrade    $http_upgrade;
    proxy_set_header Connection $connection_upgrade;

    proxy_read_timeout    3600s;
    proxy_send_timeout    3600s;
    proxy_connect_timeout 10s;
}

Three directives are now explicit. proxy_read_timeout is the ceiling on how long nginx waits between reads from the upstream. 3600 seconds (one hour) is a defensible default for an interactive WebSocket; chat sessions, dashboards, and Tauri-style desktop clients routinely sit idle for tens of minutes between user actions. proxy_send_timeout is the symmetric ceiling on writes to the upstream, also 60 seconds by default; if the app is slow to drain a backpressure buffer, the connection can die from this direction too, so match it to the read timeout. proxy_connect_timeout is the unrelated bound on the initial TCP handshake to the upstream container. Keep it small (10 seconds is generous) so a crashed container fails the request fast instead of hanging clients for an hour.

Why 3600 seconds and not 0 (unlimited)? Removing the ceiling looks cleaner, but it disables a real safety net. nginx workers have a finite connection budget per worker process. If a buggy upstream stops responding without closing the socket, an unbounded timeout means those connections accumulate forever and the worker eventually refuses new clients. A one-hour ceiling guarantees runaway connections eventually evict themselves while still being three orders of magnitude above any reasonable idle pattern.

Reload nginx (docker compose restart nginx) and rerun the 90-second client. Now the loop ticks forever. The proxy layer is fixed.

Lesson 4: app-side ping as defense-in-depth

Production adds one more wrinkle that nginx alone cannot solve. Between the browser and the VPS sit middleboxes: corporate HTTP proxies, mobile carrier NATs, cloud load balancers, even some home routers. Many of them silently drop idle TCP after two to five minutes regardless of what either end has agreed. AWS Network Load Balancer has a 350-second idle ceiling. Cloudflare's free tier closes idle WebSockets after roughly 100 seconds. Verizon LTE NATs have been observed dropping at 90 seconds. A 3600-second nginx timeout does nothing for any of those.

The fix is to send a small frame every 20 seconds so no middlebox ever sees the socket as idle. Commit at 6c8a564 adds the ping loop server-side and a verification client.

PING_INTERVAL_SECONDS = 20

async def _ping_forever(ws: WebSocket) -> None:
    while True:
        await asyncio.sleep(PING_INTERVAL_SECONDS)
        await ws.send_text("__ping__")

@app.websocket("/ws")
async def echo(ws: WebSocket) -> None:
    await ws.accept()
    ping_task = asyncio.create_task(_ping_forever(ws))
    try:
        while True:
            msg = await ws.receive_text()
            if msg == "__pong__":
                continue
            ts = datetime.now(timezone.utc).isoformat(timespec="seconds")
            await ws.send_text(f"[{ts}] echo: {msg}")
    except WebSocketDisconnect:
        pass
    finally:
        ping_task.cancel()

20 seconds is the sweet spot. Shorter intervals (5 to 10 seconds) waste bandwidth and battery on mobile; longer intervals (30 to 45 seconds) start losing to the more aggressive carrier NATs. The 20-second cadence sits comfortably under every middlebox idle ceiling you are likely to meet in production while costing roughly 4 to 8 bytes of payload per ping.

A pure server-initiated ping is enough because every middlebox that drops idle TCP only looks at the absence of bytes in either direction. A frame from server to client resets the same timer that a frame from client to server would reset. If your client framework supports it, you can also send WebSocket protocol-level pings (opcode 0x9) instead of a text marker; the trade-off is that the FastAPI WebSocket wrapper does not expose ping() directly, so an application-level marker like __ping__ is the path of least dependency hassle.

Verifying end to end

The repo ships a client/test_client.py that connects, ticks every 90 seconds, and prints uptime when the connection eventually dies. On lesson 1's config it reports DROPPED after ~60s. On lesson 3's config it runs until you kill it. That difference is the entire point of this walkthrough and exactly the smoke test you should add to your own CI before any future change to nginx config.

If you want a one-liner sanity check without Python, wscat -c ws://localhost:8080/ws from the websockets/wscat project works equally well; type a line, wait two minutes, type another, and watch whether the second message round-trips.

Repository

Full source at https://github.com/vytharion/nginx-websocket-proxy-read-timeout-docker. Each lesson is one commit so you can git checkout <sha> to reproduce any intermediate state.

Lesson 1 → b55c693 — reproduce the naive 60-second drop with a minimal FastAPI app and a default nginx reverse proxy
Lesson 2 → 609e61b — add proxy_http_version 1.1 plus the Upgrade and Connection headers so the handshake survives end-to-end
Lesson 3 → 27751a2 — raise proxy_read_timeout and proxy_send_timeout to 3600 seconds while keeping proxy_connect_timeout short
Lesson 4 → 6c8a564 — add a 20-second server-side keepalive ping plus a verification client that ticks every 90 seconds

Conclusion and next steps

The whole fix is four directives in one location block plus one optional server-side ping loop. The reason the bug feels mysterious the first time is that every layer reports a clean close, so no single log line points at the timeout. Once you internalize that nginx's defaults are HTTP-shaped and WebSockets are not HTTP-shaped, the diagnosis becomes a two-step checklist: confirm the upgrade headers travel end-to-end, then confirm the read timeout exceeds the expected idle window.

Clone the repo, run docker compose up --build, point the verification client at ws://localhost:8080/ws, and watch it survive past 60 seconds. From there, useful follow-on changes include terminating TLS in nginx with proxy_set_header X-Forwarded-Proto $scheme, adding a dedicated upstream block so you can swap to a unix socket later, and wiring a Prometheus exporter for stub_status so the next time a timeout fires you see it on a dashboard before a user does.