Skip to content

RKE2 Deployment Guide

This guide provides comprehensive approaches to deploying RKE2 clusters for different scenarios: High Availability (HA) production environments and Edge computing deployments.

Deployment Patterns

High Availability Deployments

  • Use Case: Production datacenters, enterprise environments
  • Architecture: 3+ server nodes with external load balancers
  • Automation: Terraform for infrastructure + Argo CD for GitOps
  • Characteristics: Full redundancy, external dependencies

Edge Computing Deployments

  • Use Case: Remote locations, resource-constrained environments
  • Architecture: 1 server node with 2+ agents
  • Automation: edgectl for automated bootstrap
  • Characteristics: Minimal footprint, self-contained, intermittent connectivity

Edge Deployment with edgectl

For edge computing scenarios, I use my custom edgectl tool that automates the entire RKE2 lifecycle with secure token management via HashiCorp Vault.

Prerequisites

Before using edgectl, ensure you have:

  • HashiCorp Vault deployed and accessible
  • Vault credentials configured
  • Root access on target nodes
  • Network connectivity between nodes

Install edgectl

bash
# Install the latest version
go install github.com/michielvha/edgectl@latest

# Verify installation
edgectl version

Edge Cluster Bootstrap

Step 1: Bootstrap First Server Node

bash
# On the first server node, run:
edgectl rke2 server

# This will:
# 1. Generate a unique cluster-id (e.g., rke2-abc12345)
# 2. Install and configure RKE2 server
# 3. Store the join token in Vault at kv/data/rke2/<cluster-id>
# 4. Save cluster-id to /etc/edgectl/cluster-id

Step 2: Retrieve Cluster ID

bash
# Check the generated cluster-id
cat /etc/edgectl/cluster-id

# Example output: rke2-abc12345

Step 3: Join Agent Nodes

bash
# On each agent node, run:
edgectl rke2 agent --cluster-id rke2-abc12345

# This will:
# 1. Retrieve the join token from Vault using the cluster-id
# 2. Install and configure RKE2 agent
# 3. Join the agent to the control plane
# 4. No manual token handling required!

Step 4: Add Additional Server Nodes (Optional HA)

bash
# For high availability with multiple servers:
edgectl rke2 server --token <token-from-first-server>

# Note: Secondary servers still use manual token for now

Complete Edge Deployment Example

bash
# Server Node (192.168.10.10)
ssh [email protected]
edgectl rke2 server
# Note the cluster-id from /etc/edgectl/cluster-id

# Agent Node 1 (192.168.10.11)
ssh [email protected]
edgectl rke2 agent --cluster-id rke2-abc12345

# Agent Node 2 (192.168.10.12)
ssh [email protected]
edgectl rke2 agent --cluster-id rke2-abc12345

# Verify cluster
ssh [email protected]
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
kubectl get nodes

edgectl Features

  • Automated RKE2 Installation: Complete server and agent bootstrap
  • Cluster ID Management: Unique cluster identification for multi-cluster environments
  • HashiCorp Vault Integration: Secure token storage and retrieval
  • Token-less Agent Join: Agents join using cluster-id, not raw tokens
  • Embedded Scripts: Modular bash scripts for system-level operations
  • Load Balancer Support: HAProxy + Keepalived configuration

How edgectl Works

  1. Server Bootstrap: Generates a unique cluster-id (e.g., rke2-abc12345)
  2. Token Storage: Automatically stores join token in Vault at kv/data/rke2/<cluster-id>
  3. Cluster ID Persistence: Saves cluster-id to /etc/edgectl/cluster-id
  4. Agent Join: Agents retrieve token from Vault using cluster-id
  5. No Manual Token Handling: All token operations are automated and secure

Token Lifecycle with edgectl

StageActionLocation
Server BootstrapToken generated and storedVault: kv/data/rke2/<cluster-id>
Cluster ID CreatedUnique ID persistedNode: /etc/edgectl/cluster-id
Agent InstallationToken retrieved automaticallyVault: kv/data/rke2/<cluster-id>
Additional MastersManual token for HA setupProvided via --token flag

File Layout

# Local Node Files
/etc/edgectl/cluster-id              # Stores the generated cluster-id

# Vault Paths
kv/data/rke2/<cluster-id>            # Join token + metadata for the cluster

High Availability Deployment

For production environments requiring maximum uptime and redundancy, use the traditional Terraform + manual approach.

Pre-Deployment Planning

Infrastructure Requirements

Before deployment, ensure your infrastructure meets these requirements:

ComponentRequirementNotes
Load Balancers2x Layer 4 TCP LBActive-passive configuration
Server Nodes3x (odd number)Embedded etcd requires odd count
Agent Nodes3+Scale based on workload requirements
Operating SystemUbuntu 22.04+SELinux/AppArmor supported
NetworkLow latency LAN<10ms between nodes preferred

Network Preparation

Ensure the following ports are accessible between nodes:

bash
# Server-to-Server Communication
6443/tcp   # Kubernetes API
9345/tcp   # RKE2 Supervisor API  
2379/tcp   # etcd client
2380/tcp   # etcd peer
2381/tcp   # etcd metrics

# All Nodes
10250/tcp  # Kubelet API
30000-32767/tcp # NodePort range

# Cilium CNI
4240/tcp   # Health checks
4244/tcp   # Hubble gRPC
8472/udp   # VXLAN (if enabled)

Automated Deployment Module

I've developed a deployment module that automates the entire RKE2 bootstrap process. The module is available at:

bash
# Source the RKE2 deployment module
source <(curl -fsSL https://raw.githubusercontent.com/michielvha/PDS/main/bash/module/rke2.sh)

Server Node Bootstrap

The automated server bootstrap includes:

  • ✅ Operating system hardening and preparation
  • ✅ Firewall configuration (UFW/iptables)
  • ✅ RKE2 installation with CIS profile
  • ✅ Cilium CNI configuration
  • ✅ Load balancer integration
  • ✅ Certificate SAN configuration
bash
# Bootstrap first server node
configure_rke2_server_primary

# Bootstrap additional server nodes  
configure_rke2_server_additional

Agent Node Bootstrap

Agent node automation includes:

  • ✅ Operating system preparation
  • ✅ Firewall configuration for agent role
  • ✅ RKE2 agent installation and configuration
  • ✅ Automatic cluster joining
bash
# Bootstrap agent nodes
configure_rke2_agent

Manual Deployment Steps

For environments requiring manual deployment, follow these steps:

Step 1: System Preparation

Update the system and install prerequisites:

bash
# Update system packages
sudo apt update && sudo apt upgrade -y

# Install required packages
sudo apt install -y curl wget unzip

# Configure firewall (UFW example)
sudo ufw enable
sudo ufw allow 22/tcp    # SSH
sudo ufw allow 6443/tcp  # Kubernetes API
sudo ufw allow 9345/tcp  # RKE2 Supervisor
sudo ufw allow 10250/tcp # Kubelet

Step 2: RKE2 Installation

Download and install RKE2:

bash
# Download RKE2 installation script
curl -sfL https://get.rke2.io | sudo sh -

# Enable and start RKE2 service
sudo systemctl enable rke2-server.service

Step 3: Server Configuration

Primary Server Node

Create the initial server configuration:

bash
sudo mkdir -p /etc/rancher/rke2

cat <<EOF | sudo tee /etc/rancher/rke2/config.yaml
# Basic Configuration
write-kubeconfig-mode: "0644"
profile: "cis"

# Load Balancer Configuration
tls-san:
  - "lb.edge.example.com"
  - "192.168.10.100"  # LB VIP
  - "192.168.10.10"   # Local node IP

# Cilium CNI Configuration
cni: "cilium"
disable:
  - "rke2-kube-proxy"  # Cilium replaces kube-proxy

# Security Configuration
selinux: true
secrets-encryption: true
EOF

Additional Server Nodes

For the second and third server nodes:

bash
cat <<EOF | sudo tee /etc/rancher/rke2/config.yaml
# Cluster Join Configuration
server: https://lb.edge.example.com:9345
token: ${NODE_TOKEN}  # From first server

# Basic Configuration  
write-kubeconfig-mode: "0644"
profile: "cis"

# Load Balancer Configuration
tls-san:
  - "lb.edge.example.com"
  - "192.168.10.100"

# CNI Configuration
cni: "cilium"
disable:
  - "rke2-kube-proxy"

# Security Configuration
selinux: true
secrets-encryption: true
EOF

Agent Node Configuration

For worker nodes:

bash
cat <<EOF | sudo tee /etc/rancher/rke2/config.yaml
# Cluster Join Configuration
server: https://lb.edge.example.com:9345
token: ${NODE_TOKEN}

# Profile Configuration
profile: "cis"

# Security Configuration
selinux: true
EOF

Step 4: CIS Compliance Preparation

For CIS-compliant deployments, create the required etcd user:

bash
# Create etcd group and user
sudo groupadd --system etcd
sudo useradd --system --no-create-home --shell /sbin/nologin --gid etcd etcd

Step 5: Service Management

Start RKE2 services:

bash
# Start server nodes first
sudo systemctl start rke2-server.service

# Retrieve node token (from first server)
sudo cat /var/lib/rancher/rke2/server/node-token

# Start agent nodes after servers are running
sudo systemctl start rke2-agent.service

Post-Deployment Configuration

Configure kubectl Access

Set up kubectl for cluster administration:

bash
# Export kubeconfig
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml

# Add to shell profile
echo 'export KUBECONFIG=/etc/rancher/rke2/rke2.yaml' >> ~/.bashrc
echo 'export PATH=$PATH:/var/lib/rancher/rke2/bin' >> ~/.bashrc

# Source the profile
source ~/.bashrc

Add Useful Aliases

bash
cat <<EOF >> ~/.bashrc
# Kubernetes aliases
alias k='kubectl'
alias kga='kubectl get all'
alias kgp='kubectl get pods'
alias kgn='kubectl get nodes'
alias kd='kubectl describe'
alias kl='kubectl logs'
alias ke='kubectl edit'

# Enable kubectl completion
source <(kubectl completion bash)
complete -F __start_kubectl k
EOF

Verify Cluster Health

bash
# Check node status
kubectl get nodes -o wide

# Verify all system pods are running
kubectl get pods -A

# Check cluster info
kubectl cluster-info

# Remove completed pods
kubectl delete pod -n kube-system --field-selector=status.phase=Succeeded

Cilium Configuration

Custom Cilium Values

RKE2 allows customization of Cilium via HelmChartConfig. Create the following configuration:

bash
cat <<EOF | sudo tee /var/lib/rancher/rke2/server/manifests/cilium-config.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-cilium
  namespace: kube-system
spec:
  valuesContent: |-
    # Enable eBPF mode and disable kube-proxy
    kubeProxyReplacement: true
    
    # Enable Hubble observability
    hubble:
      enabled: true
      relay:
        enabled: true
      ui:
        enabled: true
    
    # Performance optimizations
    bpf:
      masquerade: true
      hostServices:
        enabled: true
    
    # Security enhancements
    encryption:
      enabled: true
      type: wireguard
EOF

Troubleshooting Common Issues

Service Won't Start

bash
# Check service status
sudo systemctl status rke2-server.service

# View detailed logs
sudo journalctl -u rke2-server.service -f

# Check configuration
sudo rke2 server --config /etc/rancher/rke2/config.yaml --dry-run

Node Join Failures

bash
# Verify token
sudo cat /var/lib/rancher/rke2/server/node-token

# Check network connectivity
curl -k https://lb.edge.example.com:9345/ping

# Verify DNS resolution
nslookup lb.edge.example.com

Certificate Issues

bash
# Regenerate certificates
sudo rm -rf /var/lib/rancher/rke2/server/tls/
sudo systemctl restart rke2-server.service

# Check certificate SANs
openssl x509 -in /var/lib/rancher/rke2/server/tls/server-ca.crt -text -noout

GitOps Integration

Once the cluster is deployed, integrate with your GitOps workflow:

  1. Cluster Secret Creation: Generate Argo CD cluster secrets
  2. Policy Application: Apply security and network policies
  3. Workload Deployment: Deploy applications via GitOps

This deployment approach ensures consistent, secure, and manageable RKE2 clusters that integrate seamlessly with modern platform engineering practices.