Home/Blog/Cloud architecture
Cloud architecture · EKS

Building a Resilient Multicloud EKS Infrastructure: AWS + Azure Hybrid Nodes

In today's cloud-native world, resilience and flexibility aren't just nice-to-haves — they're essentials. What if you could combine the best of Amazon EKS with the regional reach or compliance capabilities of Azure or multi-cloud? That's exactly what hybrid nodes let you do.

By extending your EKS worker node pool to include Azure-based VMs, you can run Kubernetes workloads across both AWS and Azure — without spinning up a second control plane or managing two clusters. In this article, we'll break down the architecture, infrastructure code (with Terraform), and practical tips to get you started with a production-ready multicloud EKS deployment.

Why go hybrid?

Here's why hybrid cloud Kubernetes setups are worth considering:

  • Business continuity: Workloads on Azure nodes remain online even if AWS compute hits issues, although control-plane outages will pause scheduling and cluster-level operations.
  • Cost optimization: Keep critical workloads on AWS and push less-sensitive tasks to lower-cost Azure resources.
  • Regional flexibility: Reach users in locations where AWS doesn't operate or meet compliance needs that Azure better serves.
  • Unified management: Run all workloads under a single Amazon EKS control plane, even though your worker nodes span multiple clouds.

At the heart of this setup is a fully managed EKS control plane running in AWS. What makes it unique? The worker nodes live in Azure. By securely linking the two clouds — using VPN or Transit Gateway — we turn Azure VMs into first-class EKS nodes using nodeadm and AWS Systems Manager. The result: a seamless hybrid Kubernetes cluster that spans clouds without doubling your control plane.

Architecture overview

The core of this setup is a fully managed Amazon EKS control plane running inside an AWS VPC. However, instead of hosting all the worker nodes within AWS, this hybrid configuration extends the EKS cluster to Azure, leveraging Linux virtual machines (such as Ubuntu 24.04) as remote Kubernetes worker nodes.

These Azure VMs are seamlessly integrated with the EKS cluster over a secure VPN connection or AWS Transit Gateway, allowing them to function as first-class EKS nodes. This is made possible through the use of nodeadm, a specialized tool that handles registration and lifecycle management of external nodes in EKS.

To ensure secure and automated management, AWS Systems Manager (SSM) is used to bootstrap, configure, and manage these hybrid worker nodes, even though they reside outside AWS. This enables centralized control and eliminates the need for SSH access or manual intervention.

For cross-cloud pod networking, Cilium is deployed as the CNI (Container Network Interface). Cilium provides robust features such as BGP peering for routing, encryption of inter-node traffic, and deep visibility and control over network policies and flow — making it a powerful choice for hybrid and multi-cloud Kubernetes networking.

Key networking tips

Networking is often the most complex aspect of hybrid Kubernetes setups — especially when connecting AWS and Azure. Here are some best practices to ensure a smooth, secure deployment:

  • Use AWS Site-to-Site VPN for the quick path. For dedicated private connectivity, pair AWS Direct Connect with Azure ExpressRoute through a provider (e.g. Megaport) or a co-location cross-connect.
  • Deploy virtual network gateways on both cloud platforms with non-overlapping CIDR blocks to prevent routing conflicts.
  • Configure firewall rules and Azure NSGs to allow required traffic flows — including access to the Kubernetes API server, pod CIDRs, and other critical components.
  • Cilium plays a vital role in bridging the clouds: encrypted pod-to-pod communication, BGP-powered routing, and high-performance CNI functionality.

Terraform: your automation backbone

All infrastructure components are defined using Terraform, enabling reproducibility, modularity, and consistent deployment across both AWS and Azure. On AWS, that means the aws_eks_cluster control plane, IAM roles, VPC, subnets and route tables, and an aws_ec2_transit_gateway for cross-cloud routing, plus aws_ssm_activation so Systems Manager can securely control the Azure-based nodes. On Azure, it provisions the Linux VMs, virtual network gateway, NICs, IPs and NSGs, with cloud-init installing nodeadm and registering each VM with the cluster.

terraform {
  required_providers {
    aws        = ">= 5.47.0"
    azurerm    = "~> 3.0"
    azapi      = "~> 1.5"
    helm       = "~> 2.0"
    kubernetes = "~> 2.0"
  }
}

It's important to note that only certain operating systems work with hybrid nodes:

OSNotes
Amazon Linux 2023Use in virtualized environments
Ubuntu 20.04–24.04Fully supported for hybrid usage
RHEL 8 & 9Enterprise-ready with hybrid capabilities

Billing is also a key factor when creating hybrid node clusters:

  • AWS charges for the EKS control plane and associated services (e.g. SSM, VPN).
  • Azure VMs are billed separately under your Azure subscription.
  • Hybrid node billing starts once a VM joins the EKS cluster and stops when it is de-registered.
  • Be proactive with resource cleanup to avoid unnecessary costs.

Advanced: Cilium & BGP peering

Cilium is an essential part of the hybrid networking layer, enabling high-performance networking and transparent security policies. Install it in EKS using Helm with a templated BGP config file, define BGP peering rules via YAML templates to control traffic between cloud zones and VMs, and validate hybrid pod-to-pod networking by testing connectivity between workloads on AWS and Azure nodes.

helm install cilium cilium/cilium --values cilium-bgp.yaml

For detailed guidance, refer to AWS' official hybrid nodes documentation, which outlines the lifecycle of hybrid nodes and control-plane interactions.

Final thoughts

Hybrid Kubernetes clusters are no longer niche. Whether for regulatory compliance, cloud independence, or disaster resilience, this architecture offers real operational value. By combining Amazon EKS for a fully managed control plane, Azure VMs for elastic compute, Cilium for secure cross-cloud networking, and Terraform for unified infrastructure as code, you can build a multicloud-native Kubernetes environment that feels like a single, cohesive cluster — with no duplicate control planes or manual sync headaches.

Our team has developed this hybrid EKS deployment using production-grade Terraform modules and cloud-init scripts. While this article focuses on the architecture and approach, we're happy to share the code for the right use case — just reach out.

// Ready to go hybrid?

We design and run multicloud Kubernetes platforms end to end — landing zones, networking, and the IaC to keep them reproducible.

Chat to our cloud team →
Back to all posts

Keep reading

Back to all posts