Skip to main content

Chaos on GKE Autopilot

This guide explains how to set up and run chaos engineering experiments using Harness Chaos Engineering on Google Kubernetes Engine (GKE) Autopilot clusters.

Overview

GKE Autopilot is Google's fully managed Kubernetes service that provides a hands-off experience while maintaining security and compliance. However, Autopilot has specific restrictions compared to standard GKE clusters, including limited permissions and no direct access to nodes.

For additional information about running privileged workloads on GKE Autopilot, see Google Partner Docs and Run privileged workloads from GKE Autopilot partners.

Prerequisites

Before you begin, ensure you have:

  • A running GKE Autopilot cluster
  • kubectl access to the cluster with appropriate permissions
  • A Harness account with Chaos Engineering module enabled
  • Cluster admin permissions to create allowlist synchronizers

Step-by-Step Setup Guide

Step 1: Configure GKE Autopilot Allowlist

GKE Autopilot requires an allowlist that defines exemptions from security restrictions for specific workloads. Harness maintains an allowlist for chaos engineering operations that you need to apply to your cluster.

Required permissions: You need cluster admin permissions and kubectl access to apply the allowlist synchronizer.

Apply the allowlist synchronizer to your GKE Autopilot cluster:

kubectl apply -f - <<'EOF'
apiVersion: auto.gke.io/v1
kind: AllowlistSynchronizer
metadata:
name: harness-chaos-allowlist-synchronizer
spec:
allowlistPaths:
- Harness/allowlists/chaos/v1.62/*
- Harness/allowlists/service-discovery/v0.42/*
EOF

Wait for the allowlist synchronizer to be ready:

kubectl wait --for=condition=Ready allowlistsynchronizer/harness-chaos-allowlist-synchronizer --timeout=60s
Version Updates

The allowlist paths include version numbers (e.g., v1.62, v0.42) that may change with Harness updates. If you encounter issues:

  1. Check the Harness release notes for the latest supported versions
  2. Update the allowlist paths accordingly
  3. Contact Harness support for the most current allowlist versions

Step 2: Enable GKE Autopilot Compatibility

After applying the allowlist synchronizer, you need to enable GKE Autopilot compatibility in your existing Harness infrastructure:

Alternative Setup Options

You can also configure the "Use static name for configmap and secret" option for GKE Autopilot compatibility during:

Configure Infrastructure for GKE Autopilot

  1. Navigate to Chaos EngineeringEnvironments and select your environment.

    Choose Environment

  2. Click the options menu (⋮) next to your infrastructure and select Edit Infrastructure

    Edit Infrastructure

  3. Toggle on "Use static name for configmap and secret" and click Save

    Enable gke

Configure Service Discovery

  1. Navigate to Project SettingsDiscovery

    Choose Discovery

  2. Click the options menu (⋮) next to your discovery agent and select Edit

    Edit Discovery

  3. Toggle on "Use static name for configmap and secret" and click Update Discovery Agent

    Enable toggle

Step 3: Start Running Chaos Experiments

Your GKE Autopilot cluster is now ready for chaos engineering. To create and run your first experiment, follow the Create Experiments guide and choose from any of the supported experiments listed below.

Supported Chaos Experiments

Harness Chaos Engineering provides comprehensive Kubernetes fault coverage. On GKE Autopilot, experiments are categorized based on compatibility with Autopilot's security model.

Supported Pod-Level Experiments

These experiments work seamlessly on GKE Autopilot as they operate within container boundaries:

Container Resource Stress

Container Storage Operations

  • Disk Fill: Fills the pod's ephemeral storage
  • FS Fill: Applies filesystem stress by filling pod's ephemeral storage

Container Lifecycle Management

  • Container Kill: Causes container failure on specific or random replicas
  • Pod Delete: Causes specific or random replicas to fail forcibly or gracefully
  • Pod Autoscaler: Tests whether nodes can accommodate multiple replicas

Network Chaos (Container-Level)

DNS Manipulation

HTTP/API Fault Injection

API Gateway/Service Mesh Faults

File System I/O Manipulation

JVM-Specific Chaos (Java Applications)

Database Integration Chaos

Cache and Data Store Chaos

System Time Manipulation

  • Time Chaos: Introduces controlled time offsets to disrupt system time

Node-Level Chaos Experiments

Harness Chaos now supports select node-level chaos experiments on GKE Autopilot that operate within the security constraints of the managed environment:

Node Network Chaos

Node Service Management

Next Steps

Now that you have Harness Chaos Engineering set up on your GKE Autopilot cluster:

  1. Create Your First Experiment: Start with a simple Pod CPU Hog experiment with low intensity to test your setup

  2. Set Up Application Discovery: Enable Service Discovery in your infrastructure settings and explore Application Maps to visualize your services

  3. Add Monitoring: Configure probes to validate your application's resilience during experiments

  4. Explore More Experiments: Try network chaos like Pod Network Latency or JVM faults for Java applications