Mutating webhooks and ghost affinity on StatefulSets

TLDR: Mutating admission webhooks that inject scheduling rules (nodeSelector, tolerations, affinity) persist on StatefulSet pods even after you clean the StatefulSet template. The webhook re-fires on every pod CREATE and can read stale metadata to re-inject what you removed. The fix is to scale to 0 and back up — rollout restart doesn’t work.

I was decommissioning a GKE ComputeClass. ComputeClasses let you define scheduling profiles — node selectors, tolerations, affinity rules — and apply them to namespaces via labels. Under the hood, GKE enforces them with a mutating admission webhook (warden-mutating.config.common-webhooks.networking.gke.io).

After removing the ComputeClass, the namespace labels were clean, and I’d patched the StatefulSet template to strip out every trace of the old scheduling rules. But every time a pod was recreated, it came back with cloud.google.com/compute-class: system in its nodeSelector and matching tolerations. The pod would sit in Pending forever because no nodes matched the removed class.

How mutating webhooks interact with StatefulSets Link to heading

Mutating admission webhooks can intercept multiple resource types. A webhook that manages pod scheduling typically fires on:

StatefulSets on UPDATE — patches the pod template inside the StatefulSet spec
Pods on CREATE — patches the pod spec directly at admission time

This dual interception is the root of the problem. When the policy is active, the webhook mutates both the StatefulSet (so the template reflects the scheduling rules) and each pod (so the live spec matches). When you remove the policy, you need to undo both layers — and that’s harder than it sounds.

The trap Link to heading

Here’s the sequence that looks correct but doesn’t work:

Remove the scheduling policy (clean namespace labels, delete the ComputeClass)
Patch the StatefulSet template to remove all injected scheduling rules
Delete the stuck pod, expecting the StatefulSet controller to recreate it cleanly

Step 3 fails. The StatefulSet controller creates a new pod, the webhook fires on that CREATE event, and — despite the policy being gone — it re-injects the old affinity rules.

The culprit is the kubectl.kubernetes.io/last-applied-configuration annotation on the StatefulSet. The webhook reads scheduling intent from this annotation rather than (or in addition to) the live spec or namespace labels. Even after you’ve patched the template, the annotation can still carry references to the old ComputeClass configuration, and the webhook dutifully re-applies them.

A rollout restart (kubectl rollout restart) doesn’t help either. It performs a rolling update — deleting and recreating pods one at a time — but the webhook still fires on each new pod CREATE.

The fix Link to heading

Scale the StatefulSet to 0, then back to the desired replica count:

kubectl scale statefulset my-stateful-app -n my-namespace --replicas=0
kubectl scale statefulset my-stateful-app -n my-namespace --replicas=3

This forces a full teardown and recreation. The scale-to-zero clears the stale pod state, and when the StatefulSet controller creates fresh pods, the annotation context is clean enough that the webhook no longer injects the removed scheduling rules.

Verifying the fix Link to heading

After scaling back up, check that the new pods are clean:

# Should return empty or just default selectors like kubernetes.io/os
kubectl get pod my-stateful-app-0 -n my-namespace \
  -o jsonpath='{.spec.nodeSelector}'

# Should only show default tolerations (not-ready, unreachable)
kubectl get pod my-stateful-app-0 -n my-namespace \
  -o jsonpath='{.spec.tolerations}' | python3 -m json.tool

A useful diagnostic is to create a throwaway pod in the same namespace:

kubectl run webhook-test --image=busybox -n my-namespace \
  --restart=Never -- sleep 10

kubectl get pod webhook-test -n my-namespace \
  -o jsonpath='{.spec.nodeSelector}'

kubectl delete pod webhook-test -n my-namespace

If the test pod comes back clean but your StatefulSet pods don’t, the webhook is reading from stale StatefulSet metadata — and the scale-to-zero fix applies.

Operator-managed StatefulSets Link to heading

If your StatefulSet is managed by an operator (like prometheus-operator managing Prometheus or Alertmanager StatefulSets), you can’t patch the StatefulSet directly — the operator will overwrite your changes on the next reconciliation loop.

Instead, patch the custom resource that the operator reads from:

kubectl patch prometheus my-prometheus -n monitoring --type=merge \
  -p '{"spec":{"affinity":{"nodeAffinity":null},"tolerations":null}}'

Then delete the pending pods so the operator recreates them with the clean spec from the CR.

How mutating webhooks interact with StatefulSets Link to heading

The trap Link to heading

The fix Link to heading

Verifying the fix Link to heading

Operator-managed StatefulSets Link to heading

Further reading Link to heading