RKE2 Deployment Guide
This guide provides comprehensive approaches to deploying RKE2 clusters for different scenarios: High Availability (HA) production environments and Edge computing deployments.
Deployment Patterns
High Availability Deployments
- Use Case: Production datacenters, enterprise environments
- Architecture: 3+ server nodes with external load balancers
- Automation: Terraform for infrastructure + Argo CD for GitOps
- Characteristics: Full redundancy, external dependencies
Edge Computing Deployments
- Use Case: Remote locations, resource-constrained environments
- Architecture: 1 server node with 2+ agents
- Automation: edgectl for automated bootstrap
- Characteristics: Minimal footprint, self-contained, intermittent connectivity
Edge Deployment with edgectl
For edge computing scenarios, I use my custom edgectl tool that automates the entire RKE2 lifecycle with secure token management via HashiCorp Vault.
Prerequisites
Before using edgectl, ensure you have:
- HashiCorp Vault deployed and accessible
- Vault credentials configured
- Root access on target nodes
- Network connectivity between nodes
Install edgectl
# Install the latest version
go install github.com/michielvha/edgectl@latest
# Verify installation
edgectl versionEdge Cluster Bootstrap
Step 1: Bootstrap First Server Node
# On the first server node, run:
edgectl rke2 server
# This will:
# 1. Generate a unique cluster-id (e.g., rke2-abc12345)
# 2. Install and configure RKE2 server
# 3. Store the join token in Vault at kv/data/rke2/<cluster-id>
# 4. Save cluster-id to /etc/edgectl/cluster-idStep 2: Retrieve Cluster ID
# Check the generated cluster-id
cat /etc/edgectl/cluster-id
# Example output: rke2-abc12345Step 3: Join Agent Nodes
# On each agent node, run:
edgectl rke2 agent --cluster-id rke2-abc12345
# This will:
# 1. Retrieve the join token from Vault using the cluster-id
# 2. Install and configure RKE2 agent
# 3. Join the agent to the control plane
# 4. No manual token handling required!Step 4: Add Additional Server Nodes (Optional HA)
# For high availability with multiple servers:
edgectl rke2 server --token <token-from-first-server>
# Note: Secondary servers still use manual token for nowComplete Edge Deployment Example
# Server Node (192.168.10.10)
ssh [email protected]
edgectl rke2 server
# Note the cluster-id from /etc/edgectl/cluster-id
# Agent Node 1 (192.168.10.11)
ssh [email protected]
edgectl rke2 agent --cluster-id rke2-abc12345
# Agent Node 2 (192.168.10.12)
ssh [email protected]
edgectl rke2 agent --cluster-id rke2-abc12345
# Verify cluster
ssh [email protected]
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
kubectl get nodesedgectl Features
- ✅ Automated RKE2 Installation: Complete server and agent bootstrap
- ✅ Cluster ID Management: Unique cluster identification for multi-cluster environments
- ✅ HashiCorp Vault Integration: Secure token storage and retrieval
- ✅ Token-less Agent Join: Agents join using cluster-id, not raw tokens
- ✅ Embedded Scripts: Modular bash scripts for system-level operations
- ✅ Load Balancer Support: HAProxy + Keepalived configuration
How edgectl Works
- Server Bootstrap: Generates a unique cluster-id (e.g.,
rke2-abc12345) - Token Storage: Automatically stores join token in Vault at
kv/data/rke2/<cluster-id> - Cluster ID Persistence: Saves cluster-id to
/etc/edgectl/cluster-id - Agent Join: Agents retrieve token from Vault using cluster-id
- No Manual Token Handling: All token operations are automated and secure
Token Lifecycle with edgectl
| Stage | Action | Location |
|---|---|---|
| Server Bootstrap | Token generated and stored | Vault: kv/data/rke2/<cluster-id> |
| Cluster ID Created | Unique ID persisted | Node: /etc/edgectl/cluster-id |
| Agent Installation | Token retrieved automatically | Vault: kv/data/rke2/<cluster-id> |
| Additional Masters | Manual token for HA setup | Provided via --token flag |
File Layout
# Local Node Files
/etc/edgectl/cluster-id # Stores the generated cluster-id
# Vault Paths
kv/data/rke2/<cluster-id> # Join token + metadata for the clusterHigh Availability Deployment
For production environments requiring maximum uptime and redundancy, use the traditional Terraform + manual approach.
Pre-Deployment Planning
Infrastructure Requirements
Before deployment, ensure your infrastructure meets these requirements:
| Component | Requirement | Notes |
|---|---|---|
| Load Balancers | 2x Layer 4 TCP LB | Active-passive configuration |
| Server Nodes | 3x (odd number) | Embedded etcd requires odd count |
| Agent Nodes | 3+ | Scale based on workload requirements |
| Operating System | Ubuntu 22.04+ | SELinux/AppArmor supported |
| Network | Low latency LAN | <10ms between nodes preferred |
Network Preparation
Ensure the following ports are accessible between nodes:
# Server-to-Server Communication
6443/tcp # Kubernetes API
9345/tcp # RKE2 Supervisor API
2379/tcp # etcd client
2380/tcp # etcd peer
2381/tcp # etcd metrics
# All Nodes
10250/tcp # Kubelet API
30000-32767/tcp # NodePort range
# Cilium CNI
4240/tcp # Health checks
4244/tcp # Hubble gRPC
8472/udp # VXLAN (if enabled)Automated Deployment Module
I've developed a deployment module that automates the entire RKE2 bootstrap process. The module is available at:
# Source the RKE2 deployment module
source <(curl -fsSL https://raw.githubusercontent.com/michielvha/PDS/main/bash/module/rke2.sh)Server Node Bootstrap
The automated server bootstrap includes:
- ✅ Operating system hardening and preparation
- ✅ Firewall configuration (UFW/iptables)
- ✅ RKE2 installation with CIS profile
- ✅ Cilium CNI configuration
- ✅ Load balancer integration
- ✅ Certificate SAN configuration
# Bootstrap first server node
configure_rke2_server_primary
# Bootstrap additional server nodes
configure_rke2_server_additionalAgent Node Bootstrap
Agent node automation includes:
- ✅ Operating system preparation
- ✅ Firewall configuration for agent role
- ✅ RKE2 agent installation and configuration
- ✅ Automatic cluster joining
# Bootstrap agent nodes
configure_rke2_agentManual Deployment Steps
For environments requiring manual deployment, follow these steps:
Step 1: System Preparation
Update the system and install prerequisites:
# Update system packages
sudo apt update && sudo apt upgrade -y
# Install required packages
sudo apt install -y curl wget unzip
# Configure firewall (UFW example)
sudo ufw enable
sudo ufw allow 22/tcp # SSH
sudo ufw allow 6443/tcp # Kubernetes API
sudo ufw allow 9345/tcp # RKE2 Supervisor
sudo ufw allow 10250/tcp # KubeletStep 2: RKE2 Installation
Download and install RKE2:
# Download RKE2 installation script
curl -sfL https://get.rke2.io | sudo sh -
# Enable and start RKE2 service
sudo systemctl enable rke2-server.serviceStep 3: Server Configuration
Primary Server Node
Create the initial server configuration:
sudo mkdir -p /etc/rancher/rke2
cat <<EOF | sudo tee /etc/rancher/rke2/config.yaml
# Basic Configuration
write-kubeconfig-mode: "0644"
profile: "cis"
# Load Balancer Configuration
tls-san:
- "lb.edge.example.com"
- "192.168.10.100" # LB VIP
- "192.168.10.10" # Local node IP
# Cilium CNI Configuration
cni: "cilium"
disable:
- "rke2-kube-proxy" # Cilium replaces kube-proxy
# Security Configuration
selinux: true
secrets-encryption: true
EOFAdditional Server Nodes
For the second and third server nodes:
cat <<EOF | sudo tee /etc/rancher/rke2/config.yaml
# Cluster Join Configuration
server: https://lb.edge.example.com:9345
token: ${NODE_TOKEN} # From first server
# Basic Configuration
write-kubeconfig-mode: "0644"
profile: "cis"
# Load Balancer Configuration
tls-san:
- "lb.edge.example.com"
- "192.168.10.100"
# CNI Configuration
cni: "cilium"
disable:
- "rke2-kube-proxy"
# Security Configuration
selinux: true
secrets-encryption: true
EOFAgent Node Configuration
For worker nodes:
cat <<EOF | sudo tee /etc/rancher/rke2/config.yaml
# Cluster Join Configuration
server: https://lb.edge.example.com:9345
token: ${NODE_TOKEN}
# Profile Configuration
profile: "cis"
# Security Configuration
selinux: true
EOFStep 4: CIS Compliance Preparation
For CIS-compliant deployments, create the required etcd user:
# Create etcd group and user
sudo groupadd --system etcd
sudo useradd --system --no-create-home --shell /sbin/nologin --gid etcd etcdStep 5: Service Management
Start RKE2 services:
# Start server nodes first
sudo systemctl start rke2-server.service
# Retrieve node token (from first server)
sudo cat /var/lib/rancher/rke2/server/node-token
# Start agent nodes after servers are running
sudo systemctl start rke2-agent.servicePost-Deployment Configuration
Configure kubectl Access
Set up kubectl for cluster administration:
# Export kubeconfig
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
# Add to shell profile
echo 'export KUBECONFIG=/etc/rancher/rke2/rke2.yaml' >> ~/.bashrc
echo 'export PATH=$PATH:/var/lib/rancher/rke2/bin' >> ~/.bashrc
# Source the profile
source ~/.bashrcAdd Useful Aliases
cat <<EOF >> ~/.bashrc
# Kubernetes aliases
alias k='kubectl'
alias kga='kubectl get all'
alias kgp='kubectl get pods'
alias kgn='kubectl get nodes'
alias kd='kubectl describe'
alias kl='kubectl logs'
alias ke='kubectl edit'
# Enable kubectl completion
source <(kubectl completion bash)
complete -F __start_kubectl k
EOFVerify Cluster Health
# Check node status
kubectl get nodes -o wide
# Verify all system pods are running
kubectl get pods -A
# Check cluster info
kubectl cluster-info
# Remove completed pods
kubectl delete pod -n kube-system --field-selector=status.phase=SucceededCilium Configuration
Custom Cilium Values
RKE2 allows customization of Cilium via HelmChartConfig. Create the following configuration:
cat <<EOF | sudo tee /var/lib/rancher/rke2/server/manifests/cilium-config.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-cilium
namespace: kube-system
spec:
valuesContent: |-
# Enable eBPF mode and disable kube-proxy
kubeProxyReplacement: true
# Enable Hubble observability
hubble:
enabled: true
relay:
enabled: true
ui:
enabled: true
# Performance optimizations
bpf:
masquerade: true
hostServices:
enabled: true
# Security enhancements
encryption:
enabled: true
type: wireguard
EOFTroubleshooting Common Issues
Service Won't Start
# Check service status
sudo systemctl status rke2-server.service
# View detailed logs
sudo journalctl -u rke2-server.service -f
# Check configuration
sudo rke2 server --config /etc/rancher/rke2/config.yaml --dry-runNode Join Failures
# Verify token
sudo cat /var/lib/rancher/rke2/server/node-token
# Check network connectivity
curl -k https://lb.edge.example.com:9345/ping
# Verify DNS resolution
nslookup lb.edge.example.comCertificate Issues
# Regenerate certificates
sudo rm -rf /var/lib/rancher/rke2/server/tls/
sudo systemctl restart rke2-server.service
# Check certificate SANs
openssl x509 -in /var/lib/rancher/rke2/server/tls/server-ca.crt -text -nooutGitOps Integration
Once the cluster is deployed, integrate with your GitOps workflow:
- Cluster Secret Creation: Generate Argo CD cluster secrets
- Policy Application: Apply security and network policies
- Workload Deployment: Deploy applications via GitOps
This deployment approach ensures consistent, secure, and manageable RKE2 clusters that integrate seamlessly with modern platform engineering practices.
