Cost-optimising GKE with ComputeClass

TLDR: I built gkecc to generate cost-optimised GKE ComputeClass specs from live pricing data. It sorts by total cost (CPU + RAM), not just CPU, and intelligently interleaves spot and on-demand instances.

I was manually maintaining ComputeClass specs for our GKE clusters. Every few months I’d check GCP pricing docs, update the priority order, and hope I got it right. It was tedious and error-prone.

The breaking point: I discovered our “cost-optimised” spec was recommending instances that were actually 30% more expensive than alternatives. The problem? I’d been sorting by CPU price alone, ignoring RAM pricing entirely.

The problem with CPU-only pricing Link to heading

Most guides tell you to sort instance families by core price. That’s misleading:

Family	Core price	RAM price	4vCPU + 16GB/day
e2	$0.00527/hr	$0.00071/hr	$0.78
c2d	$0.00670/hr	$0.00090/hr	$0.99

Looking at core pricing alone, c2d seems ~27% more expensive. But for a realistic workload (4 vCPU + 16GB), it’s actually 27% more expensive overall. The RAM pricing amplifies the difference.

The solution Link to heading

I wrote gkecc - a CLI that fetches live GCP pricing and generates ComputeClass manifests sorted by actual total cost.

# Install
uv tool install git+https://github.com/brtkwr/gkecc.git

# Generate for your region
gkecc europe-north1 > compute-class.yaml

# Cap at $5/day per instance
gkecc europe-north1 --max-cost 5

# Apply directly
gkecc europe-north1 | kubectl apply -f -

Why interleaving matters Link to heading

The naive approach is “all spot first, then all on-demand”. But some spot instances are more expensive than some on-demand options:

priorities:
- machineFamily: t2d   # $0.46/day (spot) ✓
  spot: true
- machineFamily: n2d   # $2.62/day (on-demand) ✓
  spot: false
- machineFamily: z3    # $2.74/day (spot) ← more expensive than n2d on-demand!
  spot: true

gkecc sorts everything by total cost regardless of spot/on-demand, so you get optimal price-per-availability.

When to use this Link to heading

This approach prioritises spot instances, which can be preempted with 30 seconds notice. That sounds scary, but if your database is off-cluster (Cloud SQL, RDS, etc.), most workloads can be made spot-tolerant:

Run 2+ replicas behind a load balancer
Use PodDisruptionBudgets to control rollout
Let Kubernetes reschedule - it’s what it’s good at

The 60-70% cost savings are significant. I run almost everything on spot now. The main exceptions are single-replica services that can’t have any downtime, which are rare if you design for it.

For workloads that genuinely can’t tolerate preemption, use a separate ComputeClass with spot: false only.

Built with Claude Code Link to heading

The entire tool was written in Claude Code sessions. I’d describe what I wanted, test it, refine. The pricing fetch logic, cost calculation, and YAML generation all emerged from this back-and-forth.

The problem with CPU-only pricing Link to heading

The solution Link to heading

Why interleaving matters Link to heading

When to use this Link to heading

Built with Claude Code Link to heading

Further reading Link to heading