Scaling GKE with Custom Metrics

Nathan Barrett

The Problem with CPU-Based Autoscaling

Default autoscaling in GKE relies on CPU and memory usage — but in real-world event-driven systems, those metrics rarely reflect actual workload pressure.

At WALT Labs, we ran into this firsthand. Our services process data from Pub/Sub queues, and during traffic spikes, CPU usage often stayed flat while messages piled up. That meant autoscaling lagged behind real demand — or worse, didn’t trigger at all.

To fix this, we built a smarter solution: autoscaling based on Pub/Sub queue depth, using Cloud Monitoring metrics and the custom-metrics-adapter. This kind of architectural change is something we guide clients through every week — from GKE tuning to cost-aware scaling strategies.

Now our GKE workloads scale based on what really matters — incoming traffic.

💡 WALT Labs Can Help
We guide teams through container-based scaling strategies like this during our Infrastructure Modernization Workshops — including queue-based autoscaling, GKE tuning, and HPA optimization.
Request a Workshop

Scaling Based on Real Demand, Not CPU

We set up our deployment in GKE to scale horizontally using Pub/Sub’s num_undelivered_messages metric. This let us respond to real-time load without relying on lagging indicators like CPU.

This approach is ideal for batch pipelines, async APIs, or any workload where throughput — not CPU — should drive scaling. It’s also one of the techniques we bake into WALT Labs' managed Kubernetes environments as part of our Cloud Modernization services.

How We Made It Work (in 3 Steps)

1. Deploy the Custom Metrics Adapter

Use the official Helm chart or apply manifests manually.

Give the adapter’s service account the following roles:

  • roles/monitoring.viewer
  • roles/stackdriver.resourceMetadata.writer

Bind it via Workload Identity (preferred) or node IAM if you're on GKE standard.

💡 WALT Labs Can Help
This adapter setup is part of our Kubernetes Optimization Playbook, deployed in managed environments and client-led MVP builds.
Talk to an Engineer

2. Confirm Metric Visibility

Run:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1"

To debug all available metrics, use this temporary config:

adapter:  
  command:   
    - /adapter    
    - --list-all-custom-metrics

⚠️ This increases memory usage. Remove it after confirming visibility.

Once set up, you’ll see metrics like:

pubsub.googleapis.com|subscription|num_undelivered_messages

3. Configure the HPA

Here’s a working example:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: worker-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: worker-service
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: External
      external:
        metric:
          name: pubsub.googleapis.com|subscription|num_undelivered_messages
          selector:
            matchLabels:
              resource.labels.subscription_id: dev-worker-sub
              resource.labels.project_id: my-gcp-project
        target:
          type: Value
          value: 10

What Changed After We Deployed It

  • Our workers now scale in direct response to queue depth
  • We reduced idle compute and latency under load
  • Scaling feels immediate, not reactive

This model has become our default for queue-driven microservices — and a common pattern we implement during client POCs and post-workshop MVP builds.

Lessons Learned and Optimization Tips

  • Use kubectl describe hpa to monitor scaling decisions in real time
  • Set metric resolution to 1-minute intervals for fast feedback

External metrics work best in event-driven or bursty workloads like ETL, stream processing, or notification systems

💡 WALT Labs Can Help
If you’re running into the same limitations with default autoscaling, we can help — either through a funded Google Cloud workshop or direct architecture support.
Book a Strategy Call

Want Help Scaling Your GKE Environment?

WALT Labs is a Google Cloud Premier Partner. We architect production-ready Kubernetes environments that scale with precision, backed by 24/7 support and deep platform expertise.

We offer:

  • GKE deep dives through our Infrastructure Modernization Workshop
  • Full setup of custom metrics adapters and HPA tuning
  • Managed Cloud support for scaling, reliability, and cost control
  • Insights into how this fits into your FinOps picture via WALT Carbon

👉 Talk to us

Nathan Barrett
Share this post

Let’s just have a chat and see where this goes.

Book a meeting