Skip to content

RKE2 Architecture & Design

This guide details the architectural decisions and patterns I use for production RKE2 deployments, with a focus on high availability, security, and edge computing requirements.

High-Level Architecture

I deploy RKE2 in two main architectural patterns depending on the use case:

High Availability Architecture (Production/Datacenter)

For production environments, I use a 3-server, 3-agent pattern with external load balancing:

RKE2 HA Architecture

Edge Computing Architecture (Resource-Constrained)

For edge deployments, I use a 1-server, 2-agent pattern optimized for resource constraints:

RKE2 Edge Architecture

Core Components

High Availability (HA) Pattern:

  • Server Nodes (3x): Control plane with embedded etcd for quorum
  • Agent Nodes (3+): Worker nodes for application workloads
  • Load Balancers (2x): Layer 4 TCP load balancing for HA
  • Cilium CNI: eBPF-powered networking and security

Edge Computing Pattern:

  • Server Node (1x): Single control plane for resource efficiency
  • Agent Nodes (2+): Minimal worker nodes for edge workloads
  • No External LB: Direct access via VPN overlay (Tailscale)
  • Cilium CNI: Same advanced networking in resource-constrained environment

Network Architecture

Port Matrix

PortProtocolSourceDestinationDescription
6443TCPAgents + LBServersKubernetes API Server
9345TCPAgentsServersRKE2 Supervisor API
10250TCPAll NodesAll NodesKubelet API
2379TCPServersServersetcd Client Port
2380TCPServersServersetcd Peer Communication
2381TCPServersServersetcd Metrics
30000-32767TCPAll NodesAll NodesNodePort Service Range

Cilium-Specific Networking

PortProtocolSourceDestinationDescription
4240TCPAll NodesAll NodesCilium Health Check
4244TCPAll NodesAll NodesCilium Hubble gRPC
4245TCPAll NodesAll NodesCilium Hubble Relay
8472UDPAll NodesAll NodesVXLAN Overlay (when enabled)
51871UDPAll NodesAll NodesWireGuard (when enabled)

Load Balancer Configuration

Layer 4 TCP Load Balancing

For high availability, I deploy two load balancers in active-passive configuration:

yaml
# Primary Load Balancer Configuration
upstream rke2-servers {
    server 192.168.10.10:6443 max_fails=3 fail_timeout=5s;
    server 192.168.10.11:6443 max_fails=3 fail_timeout=5s;
    server 192.168.10.12:6443 max_fails=3 fail_timeout=5s;
}

server {
    listen 6443;
    proxy_pass rke2-servers;
    proxy_timeout 10s;
    proxy_connect_timeout 5s;
}

upstream rke2-supervisor {
    server 192.168.10.10:9345;
    server 192.168.10.11:9345;
    server 192.168.10.12:9345;
}

server {
    listen 9345;
    proxy_pass rke2-supervisor;
}

Health Check Configuration

bash
# Health check endpoint
curl -k https://lb.example.com:6443/healthz

Node Specifications

Production Hardware Requirements

ComponentServer NodesAgent NodesNotes
CPU4+ cores2+ coresARM64 or x86_64
Memory8GB+ RAM4GB+ RAMMore for large clusters
Storage50GB+ SSD20GB+ SSDetcd requires fast I/O
Network1Gbps+1Gbps+Low latency preferred

Example Node Layout

HostnameInternal IPExternal IPRoleArchOS
s1192.168.10.10100.64.1.10Serverx86_64Ubuntu 24.04
s2192.168.10.11100.64.1.11Serverx86_64Ubuntu 24.04
s3192.168.10.12100.64.1.12Serverx86_64Ubuntu 24.04
a1192.168.10.13100.64.1.13AgentARM64Ubuntu 24.04
a2192.168.10.14100.64.1.14AgentARM64Ubuntu 24.04
a3192.168.10.15100.64.1.15AgentARM64Ubuntu 24.04

Cilium Architecture

eBPF-Native Networking

Cilium replaces kube-proxy entirely using eBPF for maximum performance:

yaml
# Cilium Configuration
cilium:
  kubeProxyReplacement: true
  bpf:
    masquerade: true
    hostServices:
      enabled: true
  hubble:
    enabled: true
    relay:
      enabled: true
    ui:
      enabled: true

Service Mesh Integration

Cilium provides L7 policies and observability without sidecars:

  • Network Policies: eBPF-enforced microsegmentation
  • Load Balancing: Direct packet steering without NAT
  • Observability: Deep packet inspection via Hubble
  • Encryption: Transparent WireGuard mesh

Edge Computing Adaptations

Bandwidth Optimization

  • Image Caching: Local registry mirrors for reduced bandwidth
  • Compression: Enable gzip compression for API communication
  • Traffic Shaping: Prioritize control plane traffic

Intermittent Connectivity

  • Leader Election: Increased timeout values for unstable networks
  • Heartbeat Tuning: Adjusted kubelet and etcd intervals
  • Local Storage: Persistent volumes on local SSDs

Remote Access Pattern

For edge deployments, I integrate Tailscale for secure remote management:

yaml
# Node Configuration with Tailscale
node-ip: 192.168.10.10         # Internal LAN IP
node-external-ip: 100.64.1.10  # Tailscale overlay IP
tls-san:
  - "192.168.10.10"            # Internal access
  - "100.64.1.10"              # Remote access via Tailscale
  - "lb.edge.example.com"      # Load balancer FQDN

Security Architecture

Certificate Management

  • Automatic Rotation: 90-day certificate lifecycle
  • SAN Configuration: Multi-domain certificate support
  • CA Distribution: Secure CA certificate distribution

Network Segmentation

yaml
# Cilium Network Policy Example
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: edge-cluster-isolation
spec:
  endpointSelector:
    matchLabels:
      env: production
  ingress:
  - fromEndpoints:
    - matchLabels:
        env: production
  egress:
  - toEndpoints:
    - matchLabels:
        env: production

Monitoring & Observability

Hubble Integration

Cilium's Hubble provides comprehensive network observability:

  • Flow Monitoring: Real-time network flow visualization
  • Security Events: Policy violation alerts
  • Performance Metrics: Latency and throughput analytics
  • Troubleshooting: Deep packet inspection capabilities

This architecture provides a robust foundation for production RKE2 deployments with enterprise-grade security, performance, and observability.