GKE has four layers of autoscaling that work together: HPA, VPA, cluster autoscaler, and NAP. Understanding how they interact saves a lot of debugging time.

The four layers Link to heading

  1. HPA - scales pods horizontally (more replicas) based on CPU/memory/custom metrics
  2. VPA - scales pods vertically (bigger requests/limits) based on actual usage
  3. Cluster Autoscaler - scales existing node pools when pods are pending
  4. NAP (Node Auto Provisioning) - creates new node pools when no existing pool fits

HPA and VPA solve different problems: HPA for handling more traffic, VPA for right-sizing resource requests. Don’t use both on CPU/memory for the same workload - they’ll fight. VPA is great for workloads where you don’t know the right resource requests upfront.

Without NAP, you need to pre-create node pools for every machine type you might need. With NAP, GKE creates pools automatically based on workload requirements.

How long does scale-up take? Link to heading

In my experience with GKE:

  • Scale-up: 2-5 minutes from pending pod to running
  • Scale-down: 10+ minutes (configurable, conservative by default)

The scale-up time depends on node pool configuration. Preemptible/spot nodes can be slightly faster. If you need faster scale-up, consider keeping a small buffer of spare capacity.

View configuration Link to heading

View autoscaling config:

gcloud container clusters describe my-cluster \
  --region=europe-north1 \
  --format="yaml(autoscaling)"

View node pool autoprovisioning defaults:

gcloud container clusters describe my-cluster \
  --region=europe-north1 \
  --format="yaml(autoscaling.autoprovisioningNodePoolDefaults)"

Check autoscaler status in the cluster:

kubectl get cm/cluster-autoscaler-status -n kube-system -o yaml

View node allocatable resources:

kubectl describe nodes | grep -A5 "Allocatable"

Check scaling activity Link to heading

Check which nodes can be scaled down:

kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

Nodes with ToBeDeletedByClusterAutoscaler taint are being removed. For more on taints and tolerations, see Kubernetes tolerations and node selectors.

NAP and ComputeClass Link to heading

With NAP enabled, GKE picks machine types automatically. But you can guide its choices with ComputeClass - a priority-ordered list of machine families.

apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata:
  name: cost-optimised
spec:
  priorities:
  - machineFamily: t2d
    spot: true
  - machineFamily: n2d
    spot: true
  - machineFamily: e2
    spot: false

The autoscaler tries each option in order until it finds available capacity. This matters because spot availability varies and pricing differences between families are significant.

I built a tool to generate cost-optimised ComputeClass specs from live pricing data - see Cost-optimising GKE with ComputeClass.

Cost considerations Link to heading

  • Min nodes too high - Paying for idle capacity
  • Min nodes too low - Cold starts during traffic spikes
  • Scale-down too aggressive - Nodes churning up and down
  • Scale-down too conservative - Paying for unused nodes

I typically set min to handle baseline traffic, max to handle peak + 20%, and leave scale-down delay at the default (10 minutes). For cost savings, use spot/preemptible nodes for workloads that can handle interruption.

Further reading Link to heading