Kubernetes health probes for stateful Python services

TLDR: If your entrypoint script doesn’t use exec, SIGTERM never reaches your Python app and graceful shutdown silently does nothing. Docker compose masks this entirely.

I use a single /health endpoint for all three Kubernetes probes — startup, liveness, and readiness. The difference in behaviour comes from failureThreshold in the probe config, not from separate code paths.

One endpoint, three probes Link to heading

The key insight is that failureThreshold controls how tolerant each probe is. All three probes hit the same /health endpoint, but they react differently to failures:

startupProbe:
  httpGet:
    path: /health
    port: http
  periodSeconds: 5
  failureThreshold: 20  # 100s max startup time
  timeoutSeconds: 5

livenessProbe:
  httpGet:
    path: /health
    port: http
  periodSeconds: 10
  failureThreshold: 6   # tolerates ~60s of dependency blips
  timeoutSeconds: 5

readinessProbe:
  httpGet:
    path: /health
    port: http
  periodSeconds: 10
  failureThreshold: 1   # removes pod from traffic immediately
  timeoutSeconds: 5

The startup probe gives the app up to 100 seconds to boot (5s x 20). Once it passes, liveness and readiness kick in. Liveness tolerates 60 seconds of transient failures before restarting the pod. Readiness pulls the pod from traffic on the first failure.

This also means you don’t need initialDelaySeconds — the startup probe handles slow starts without wasting time if the app boots quickly.

What to check Link to heading

The /health endpoint checks three things in order:

Shutdown flag — if the app has received SIGTERM, return 503 immediately. This stops new traffic during graceful shutdown.
Database — run a simple query and grab the current alembic migration revision. This catches both connection issues and serves as a quick deployment sanity check.
External dependencies — anything else the app can’t function without.

Here’s what the FastAPI implementation looks like:

@app.get("/health")
async def health() -> Response:
    if shutting_down():
        return Response(status_code=503)

    try:
        async with db_engine.connect() as conn:
            await conn.execute(text("SELECT 1"))
    except SQLAlchemyError:
        return Response(status_code=503)

    return Response(status_code=200)

Kubernetes only cares about the status code — 200 means healthy, 503 means not. You can add a JSON body with metadata if you find it useful for debugging, but it’s not required.

Graceful shutdown Link to heading

The shutdown sequence in Kubernetes goes like this:

Pod is marked for termination
Pod is removed from Service endpoints (asynchronous)
preStop hook runs — a short sleep 5 gives time for endpoint removal to propagate
SIGTERM is sent to PID 1, which forwards it to the app
App sets shutdown flag, /health starts returning 503
App drains in-flight requests and exits
If the app hasn’t exited after terminationGracePeriodSeconds, Kubernetes sends SIGKILL

For uvicorn, there’s a subtle gotcha. When uvicorn receives SIGTERM, its handle_exit method sets should_exit = True. The main loop picks this up and calls shutdown(), which immediately closes server sockets before running lifespan shutdown. Health probes get connection refused instead of a clean 503.

The fix I prefer is overriding uvicorn’s signal handlers so the server keeps running and serves 503s until Kubernetes sends SIGKILL:

# shutdown.py
_is_shutting_down: bool = False

def shutting_down() -> bool:
    return _is_shutting_down

def _on_signal(signum: int, frame) -> None:
    global _is_shutting_down
    _is_shutting_down = True

def install_signal_handlers() -> None:
    signal.signal(signal.SIGTERM, _on_signal)
    signal.signal(signal.SIGINT, _on_signal)

# server.py
class ServerWrapper(Server):
    async def startup(self, sockets=None) -> None:
        await super().startup(sockets)
        install_signal_handlers()  # overrides uvicorn's handlers

startup() is the right place because it runs inside uvicorn’s capture_signals context — our handlers override the ones uvicorn just installed.

exec in your entrypoint Link to heading

This is the number one reason graceful shutdown silently fails, and it’s easy to miss.

If your entrypoint script looks like this:

#!/bin/bash
set -e
./bin/migrate-db
python -m myapp.run_server

The process tree ends up being init (PID 1) → shell → python. When the pod terminates, SIGTERM is sent to PID 1, which forwards it to the shell. But the shell ignores SIGTERM and never passes it on to python. Your app never knows shutdown was requested, /health keeps returning 200, and after terminationGracePeriodSeconds Kubernetes sends SIGKILL.

The fix is one word:

#!/bin/bash
set -e
./bin/migrate-db
exec python -m myapp.run_server

exec replaces the shell process with python, so the process tree becomes init (PID 1) → python. SIGTERM reaches the app directly.

Docker compose masks this bug. Most compose files use init: true, which injects dumb-init as PID 1. dumb-init correctly forwards signals to all children, so everything works fine locally. The problem only surfaces in Kubernetes.

You can verify in a running pod:

kubectl exec -n <ns> <pod> -- ps axf
# Good: init → python (no shell in between)
# Bad:  init → /bin/sh ./startup.sh → python

Also make sure your Dockerfile uses exec form for CMD:

# Bad — wraps in /bin/sh -c
CMD ./startup.sh

# Good — runs startup.sh directly
CMD ["./startup.sh"]

Common mistakes Link to heading

A few other things I’ve seen trip people up:

Using initialDelaySeconds instead of a startup probe. It either wastes time if the app starts fast, or fails if it starts slow. A startup probe adapts to however long the app actually needs.
Forgetting the shutdown check in /health. Without it, the pod keeps receiving traffic during shutdown.
p.terminate(); p.wait() in startup tests. If you override uvicorn’s signal handlers (as above), SIGTERM no longer stops the server. Your test hangs forever waiting for the process to exit. Use p.wait(timeout=5) with a p.kill() fallback.

One endpoint, three probes Link to heading

What to check Link to heading

Graceful shutdown Link to heading

exec in your entrypoint Link to heading

Common mistakes Link to heading

Further reading Link to heading