TLDR: If your entrypoint script doesn’t use exec, SIGTERM never reaches your Python app and graceful shutdown silently does nothing. Docker compose masks this entirely.
I use a single /health endpoint for all three Kubernetes probes — startup, liveness, and readiness. The difference in behaviour comes from failureThreshold in the probe config, not from separate code paths.
One endpoint, three probes Link to heading
The key insight is that failureThreshold controls how tolerant each probe is. All three probes hit the same /health endpoint, but they react differently to failures:
startupProbe:
httpGet:
path: /health
port: http
periodSeconds: 5
failureThreshold: 20 # 100s max startup time
timeoutSeconds: 5
livenessProbe:
httpGet:
path: /health
port: http
periodSeconds: 10
failureThreshold: 6 # tolerates ~60s of dependency blips
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /health
port: http
periodSeconds: 10
failureThreshold: 1 # removes pod from traffic immediately
timeoutSeconds: 5
The startup probe gives the app up to 100 seconds to boot (5s x 20). Once it passes, liveness and readiness kick in. Liveness tolerates 60 seconds of transient failures before restarting the pod. Readiness pulls the pod from traffic on the first failure.
This also means you don’t need initialDelaySeconds — the startup probe handles slow starts without wasting time if the app boots quickly.
What to check Link to heading
The /health endpoint checks three things in order:
- Shutdown flag — if the app has received SIGTERM, return 503 immediately. This stops new traffic during graceful shutdown.
- Database — run a simple query and grab the current alembic migration revision. This catches both connection issues and serves as a quick deployment sanity check.
- External dependencies — anything else the app can’t function without.
Here’s what the FastAPI implementation looks like:
@app.get("/health")
async def health() -> Response:
if shutting_down():
return Response(status_code=503)
try:
async with db_engine.connect() as conn:
await conn.execute(text("SELECT 1"))
except SQLAlchemyError:
return Response(status_code=503)
return Response(status_code=200)
Kubernetes only cares about the status code — 200 means healthy, 503 means not. You can add a JSON body with metadata if you find it useful for debugging, but it’s not required.
Graceful shutdown Link to heading
The shutdown sequence in Kubernetes goes like this:
- Pod is marked for termination
- Pod is removed from Service endpoints (asynchronous)
preStophook runs — a shortsleep 5gives time for endpoint removal to propagate- SIGTERM is sent to PID 1, which forwards it to the app
- App sets shutdown flag,
/healthstarts returning 503 - App drains in-flight requests and exits
- If the app hasn’t exited after
terminationGracePeriodSeconds, Kubernetes sends SIGKILL
For uvicorn, there’s a subtle gotcha. When uvicorn receives SIGTERM, its handle_exit method sets should_exit = True. The main loop picks this up and calls shutdown(), which immediately closes server sockets before running lifespan shutdown. Health probes get connection refused instead of a clean 503.
The fix I prefer is overriding uvicorn’s signal handlers so the server keeps running and serves 503s until Kubernetes sends SIGKILL:
# shutdown.py
_is_shutting_down: bool = False
def shutting_down() -> bool:
return _is_shutting_down
def _on_signal(signum: int, frame) -> None:
global _is_shutting_down
_is_shutting_down = True
def install_signal_handlers() -> None:
signal.signal(signal.SIGTERM, _on_signal)
signal.signal(signal.SIGINT, _on_signal)
# server.py
class ServerWrapper(Server):
async def startup(self, sockets=None) -> None:
await super().startup(sockets)
install_signal_handlers() # overrides uvicorn's handlers
startup() is the right place because it runs inside uvicorn’s capture_signals context — our handlers override the ones uvicorn just installed.
exec in your entrypoint Link to heading
This is the number one reason graceful shutdown silently fails, and it’s easy to miss.
If your entrypoint script looks like this:
#!/bin/bash
set -e
./bin/migrate-db
python -m myapp.run_server
The process tree ends up being init (PID 1) → shell → python. When the pod terminates, SIGTERM is sent to PID 1, which forwards it to the shell. But the shell ignores SIGTERM and never passes it on to python. Your app never knows shutdown was requested, /health keeps returning 200, and after terminationGracePeriodSeconds Kubernetes sends SIGKILL.
The fix is one word:
#!/bin/bash
set -e
./bin/migrate-db
exec python -m myapp.run_server
exec replaces the shell process with python, so the process tree becomes init (PID 1) → python. SIGTERM reaches the app directly.
Docker compose masks this bug. Most compose files use init: true, which injects dumb-init as PID 1. dumb-init correctly forwards signals to all children, so everything works fine locally. The problem only surfaces in Kubernetes.
You can verify in a running pod:
kubectl exec -n <ns> <pod> -- ps axf
# Good: init → python (no shell in between)
# Bad: init → /bin/sh ./startup.sh → python
Also make sure your Dockerfile uses exec form for CMD:
# Bad — wraps in /bin/sh -c
CMD ./startup.sh
# Good — runs startup.sh directly
CMD ["./startup.sh"]
Common mistakes Link to heading
A few other things I’ve seen trip people up:
- Using
initialDelaySecondsinstead of a startup probe. It either wastes time if the app starts fast, or fails if it starts slow. A startup probe adapts to however long the app actually needs. - Forgetting the shutdown check in
/health. Without it, the pod keeps receiving traffic during shutdown. p.terminate(); p.wait()in startup tests. If you override uvicorn’s signal handlers (as above), SIGTERM no longer stops the server. Your test hangs forever waiting for the process to exit. Usep.wait(timeout=5)with ap.kill()fallback.
Further reading Link to heading
- Configure liveness, readiness and startup probes — official Kubernetes docs
- Pod lifecycle — covers the termination sequence in detail
- Kubernetes best practices: terminating with grace — Google’s guide to graceful shutdown