
The Problem with CPU-Based Autoscaling
Default autoscaling in GKE relies on CPU and memory usage — but in real-world event-driven systems, those metrics rarely reflect actual workload pressure.
At WALT Labs, we ran into this firsthand. Our services process data from Pub/Sub queues, and during traffic spikes, CPU usage often stayed flat while messages piled up. That meant autoscaling lagged behind real demand — or worse, didn’t trigger at all.
To fix this, we built a smarter solution: autoscaling based on Pub/Sub queue depth, using Cloud Monitoring metrics and the custom-metrics-adapter. This kind of architectural change is something we guide clients through every week — from GKE tuning to cost-aware scaling strategies.
Now our GKE workloads scale based on what really matters — incoming traffic.
💡 WALT Labs Can Help
We guide teams through container-based scaling strategies like this during our Infrastructure Modernization Workshops — including queue-based autoscaling, GKE tuning, and HPA optimization.
→ Request a Workshop
Scaling Based on Real Demand, Not CPU
We set up our deployment in GKE to scale horizontally using Pub/Sub’s num_undelivered_messages
metric. This let us respond to real-time load without relying on lagging indicators like CPU.
This approach is ideal for batch pipelines, async APIs, or any workload where throughput — not CPU — should drive scaling. It’s also one of the techniques we bake into WALT Labs' managed Kubernetes environments as part of our Cloud Modernization services.
How We Made It Work (in 3 Steps)
1. Deploy the Custom Metrics Adapter
Use the official Helm chart or apply manifests manually.
Give the adapter’s service account the following roles:
- roles/monitoring.viewer
- roles/stackdriver.resourceMetadata.writer
Bind it via Workload Identity (preferred) or node IAM if you're on GKE standard.
💡 WALT Labs Can Help
This adapter setup is part of our Kubernetes Optimization Playbook, deployed in managed environments and client-led MVP builds.
→ Talk to an Engineer
2. Confirm Metric Visibility
Run:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1"
To debug all available metrics, use this temporary config:
adapter:
command:
- /adapter
- --list-all-custom-metrics
⚠️ This increases memory usage. Remove it after confirming visibility.
Once set up, you’ll see metrics like:
pubsub.googleapis.com|subscription|num_undelivered_messages
3. Configure the HPA
Here’s a working example:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: worker-service
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: worker-service
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: pubsub.googleapis.com|subscription|num_undelivered_messages
selector:
matchLabels:
resource.labels.subscription_id: dev-worker-sub
resource.labels.project_id: my-gcp-project
target:
type: Value
value: 10
What Changed After We Deployed It
- Our workers now scale in direct response to queue depth
- We reduced idle compute and latency under load
- Scaling feels immediate, not reactive
This model has become our default for queue-driven microservices — and a common pattern we implement during client POCs and post-workshop MVP builds.
Lessons Learned and Optimization Tips
- Use
kubectl describe hpa
to monitor scaling decisions in real time - Set metric resolution to 1-minute intervals for fast feedback
External metrics work best in event-driven or bursty workloads like ETL, stream processing, or notification systems
💡 WALT Labs Can Help
If you’re running into the same limitations with default autoscaling, we can help — either through a funded Google Cloud workshop or direct architecture support.
→ Book a Strategy Call
Want Help Scaling Your GKE Environment?
WALT Labs is a Google Cloud Premier Partner. We architect production-ready Kubernetes environments that scale with precision, backed by 24/7 support and deep platform expertise.
We offer:
- GKE deep dives through our Infrastructure Modernization Workshop
- Full setup of custom metrics adapters and HPA tuning
- Managed Cloud support for scaling, reliability, and cost control
- Insights into how this fits into your FinOps picture via WALT Carbon