Scaling GKE with Custom Metrics

Nathan Barrett

The Problem with CPU-Based Autoscaling

Default autoscaling in GKE relies on CPU and memory usage — but in real-world event-driven systems, those metrics rarely reflect actual workload pressure.

At WALT Labs, we ran into this firsthand. Our services process data from Pub/Sub queues, and during traffic spikes, CPU usage often stayed flat while messages piled up. That meant autoscaling lagged behind real demand — or worse, didn’t trigger at all.

To fix this, we built a smarter solution: autoscaling based on Pub/Sub queue depth, using Cloud Monitoring metrics and the custom-metrics-adapter. This kind of architectural change is something we guide clients through every week — from GKE tuning to cost-aware scaling strategies.

Now our GKE workloads scale based on what really matters — incoming traffic.

💡 WALT Labs Can Help
We guide teams through container-based scaling strategies like this during our Infrastructure Modernization Workshops — including queue-based autoscaling, GKE tuning, and HPA optimization.
→ Request a Workshop

‍

Scaling Based on Real Demand, Not CPU

We set up our deployment in GKE to scale horizontally using Pub/Sub’s num_undelivered_messages metric. This let us respond to real-time load without relying on lagging indicators like CPU.

This approach is ideal for batch pipelines, async APIs, or any workload where throughput — not CPU — should drive scaling. It’s also one of the techniques we bake into WALT Labs' managed Kubernetes environments as part of our Cloud Modernization services.

‍

How We Made It Work (in 3 Steps)

1. Deploy the Custom Metrics Adapter

Use the official Helm chart or apply manifests manually.

Give the adapter’s service account the following roles:

roles/monitoring.viewer
roles/stackdriver.resourceMetadata.writer

Bind it via Workload Identity (preferred) or node IAM if you're on GKE standard.

💡 WALT Labs Can Help
This adapter setup is part of our Kubernetes Optimization Playbook, deployed in managed environments and client-led MVP builds.
‍→ Talk to an Engineer

‍

2. Confirm Metric Visibility

Run:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1"

To debug all available metrics, use this temporary config:

adapter:  
  command:   
    - /adapter    
    - --list-all-custom-metrics

⚠️ This increases memory usage. Remove it after confirming visibility.

Once set up, you’ll see metrics like:

pubsub.googleapis.com|subscription|num_undelivered_messages

‍

3. Configure the HPA

Here’s a working example:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: worker-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: worker-service
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: External
      external:
        metric:
          name: pubsub.googleapis.com|subscription|num_undelivered_messages
          selector:
            matchLabels:
              resource.labels.subscription_id: dev-worker-sub
              resource.labels.project_id: my-gcp-project
        target:
          type: Value
          value: 10

‍

‍What Changed After We Deployed It

Our workers now scale in direct response to queue depth
We reduced idle compute and latency under load
Scaling feels immediate, not reactive

This model has become our default for queue-driven microservices — and a common pattern we implement during client POCs and post-workshop MVP builds.

‍

Lessons Learned and Optimization Tips

Use kubectl describe hpa to monitor scaling decisions in real time
Set metric resolution to 1-minute intervals for fast feedback

External metrics work best in event-driven or bursty workloads like ETL, stream processing, or notification systems

💡 WALT Labs Can Help
If you’re running into the same limitations with default autoscaling, we can help — either through a funded Google Cloud workshop or direct architecture support.
‍→ Book a Strategy Call

‍

Want Help Scaling Your GKE Environment?

WALT Labs is a Google Cloud Premier Partner. We architect production-ready Kubernetes environments that scale with precision, backed by 24/7 support and deep platform expertise.

We offer:

GKE deep dives through our Infrastructure Modernization Workshop
Full setup of custom metrics adapters and HPA tuning
Managed Cloud support for scaling, reliability, and cost control
Insights into how this fits into your FinOps picture via WALT Carbon

👉 Talk to us

‍

Nathan Barrett

Share this post

What We Offer

Cashback

Maximize your cloud ROI with quarterly cashback rewards.

Managed Cloud

24/7 proactive support for your cloud infrastructure.

Solutions

Tailored cloud solutions to meet your business needs.

Let’s just have a chat and see where this goes.

Book a meeting