Skip to content

Centralised Platform

A Centralised platform architecture is currently my personal favorite for production-ready Kubernetes platforms. It provides a unified control plane that simplifies management, scaling, and observability, making it easier to enforce policies and maintain compliance at scale. This approach serves as a solid foundation for organizations seeking consistency and strong governance across multiple teams and workloads.

Key Characteristics

  • Single management cluster or account hosting platform components and control plane.
  • Unified Argo CD instance managing root applications and platform services.
  • Centralized Crossplane deployment with provider credentials assuming roles in downstream accounts.
  • Core cloud guardrails managed centrally via Terraform, XRD's or native cloud IaC.
  • Developers request infrastructure resources through Kubernetes claims backed by published Compositions. The composition will include all security KPI's for the cloud resource.

Security Posture and Access Model

The management cluster typically isn't exposed to the internet, which enhances security by default. Workload clusters are often customer-facing, making them the primary attack surface. By running privileged service accounts only in the platform account, you can create resources in downstream accounts without exposing control plane components to potential workload cluster compromise. There are no Crossplane or Argo CD components in downstream clusters for an attacker to exploit.

This does create a concentration risk. If the platform account is compromised, it could impact all downstream workload accounts. The trade-off is worth it in most cases. The platform team can enforce compliance centrally without relying on policies running in each workload cluster. Centralized enforcement is simpler to audit and harder to bypass.

Pros and Cons

ProsCons
Single control plane to secure, scale, and observeLarger blast radius if Argo CD or Crossplane goes down or misconfigured
Unified Argo CD RBAC and policies. Easier compliance evidenceUpgrades or maintenance windows impact all teams
Consistent Crossplane provider config and compositionsRisk of platform becoming a bottleneck if change processes slow
Lower total cost of ownership (shared infra, fewer duplicated controllers, clear inventory)Need strong multi tenancy and RBAC discipline
Easier disaster recovery. Rehydrate management cluster then reconcile workload clustersPotential perception of central team gatekeeping if processes are heavy

When to Choose This Pattern

Best for: Centralized governance with self-service capabilities

Choose this pattern when you want teams to have autonomy without the burden of managing the GitOps control plane. It's ideal when you need consistent policy enforcement and simplified compliance auditing while reducing operational overhead. Prioritize centralized if you prefer operational efficiency to distributed infrastructure ownership.

I personally like to combine the centralised platform with dedicated clusters for each team, because this pattern scales well. It allows full control by the platform team over cloud resource provisioning while enabling development teams with self-service and full ownership of their own clusters and workloads. This reduces friction and speeds up delivery without sacrificing governance.

Crossplane Pattern (Self‑Service with Guardrails)

alt text

The centralized Crossplane approach differs from running Crossplane in each workload cluster. Instead of distributing provider credentials and compositions across teams, everything runs in the management cluster. Teams submit claims, the platform automatically provisions resources in the correct downstream account.

Cross-Account Provisioning

Crossplane runs only in the management cluster with provider credentials configured to assume minimal roles into workload accounts. Each cloud provider gets a ProviderConfig pointing to a specific account or project. The assumed roles need just enough permissions to create the resources defined in your compositions - nothing more.

TIP

The logic for creating all the IAM resources required for this pattern can be included in a terraform platform module as shown here

Claims-Only RBAC

Here's the interesting part: we can use Argo CD RBAC to restrict what can be applied to the management cluster. Teams can only submit XRD claims (like PlatformRDSInstance or PlatformS3Bucket), not raw managed resources or provider configs. This eliminates the need for complex admission policies in the platform cluster.

Workload clusters simply don't need these restrictions since there's no Crossplane running there. Teams can have full access to their own clusters without compromising the platform's security model.

The Project Separation Pattern

Because ArgoCD's clusterResourceWhitelist applies at the project level (not per destination), you need separate AppProjects when deploying to both the management cluster and workload clusters.

If you tried to use a single project with both destinations, you'd face a problem:

  • A permissive whitelist (group: '*', kind: '*') works for workload clusters but allows teams to deploy anything to the management cluster, including raw Crossplane managed resources or provider configs.
  • A restrictive whitelist (only platform XRDs) secures the management cluster but prevents deploying standard Kubernetes resources to workload clusters.

The solution is two separate projects with different security models:

yaml
# For infrastructure claims to management cluster
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: application-team-platform
  namespace: argocd
spec:
  clusterResourceWhitelist:
    # Only allow platform-defined XRDs
    - group: 'platform.custom'
      kind: 'PlatformBucket'
    - group: 'platform.custom'
      kind: 'PlatformRDSInstance'
  destinations:
    - namespace: 'crossplane-system'
      name: in-cluster
  sourceRepos:
    - '*'
...
yaml
# For workloads to team's cluster
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: application-team-workload
  namespace: argocd
spec:
  clusterResourceWhitelist:
    # Permissive - full autonomy in cluster
    - group: '*'
      kind: '*'
  destinations:
    - namespace: '*'
      name: workload-cluster-prd
  sourceRepos:
    - '*'
...

This means your application team maintains two separate ArgoCD Applications in their repository:

yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-infrastructure
  namespace: argocd
spec:
  project: application-team-platform
  source:
    repoURL: https://github.com/myteam/myapp
    path: infrastructure-app/overlays/prd
    targetRevision: main
  destination:
    name: in-cluster
    namespace: crossplane-system
yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-workloads
  namespace: argocd
spec:
  project: application-team-workload
  source:
    repoURL: https://github.com/myteam/myapp
    path: workload-app/overlays/prd
    targetRevision: main
  destination:
    name: workload-cluster-prd
    namespace: myapp

This separation creates a clear security boundary. Infrastructure requests go through the restrictive platform project, application deployments go through the permissive workload project. It also makes your audit trail cleaner since infrastructure provisioning and application deployments are tracked separately.

Alternative: Single Project with Admission Policy Enforcement

If you have a strong requirement to keep infrastructure and application resources together in the same manifests, you can use a single permissive AppProject combined with central admission policies.

Instead of using ArgoCD's clusterResourceWhitelist to enforce what can be deployed to the management cluster, run a centralized Kyverno or OPA Gatekeeper instance that blocks native cloud provider CRDs (like s3.aws.upbound.io/Bucket or rds.aws.upbound.io/Instance) while allowing your platform XRDs.

yaml
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: application-team-unified
  namespace: argocd
spec:
  clusterResourceWhitelist:
    # Permissive - admission policies handle enforcement
    - group: '*'
      kind: '*'
  destinations:
    - namespace: '*'
      name: oneb2c-be-app-prd
    - namespace: 'crossplane-system'
      name: in-cluster
  sourceRepos:
    - '*'

Then create a Kyverno ClusterPolicy in the management cluster:

yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: block-native-cloud-resources
spec:
  validationFailureAction: Enforce
  background: false
  rules:
  - name: block-aws-managed-resources
    match:
      any:
      - resources:
          kinds:
          - "*.aws.upbound.io/*"
          - "*.azure.upbound.io/*"
          - "*.gcp.upbound.io/*"
    exclude:
      any:
      - subjects:
        - kind: ServiceAccount
          name: crossplane-*
          namespace: crossplane-system
    validate:
      message: "Direct use of cloud provider managed resources is not allowed. Use platform XRDs instead (e.g., PlatformBucket, PlatformRDSInstance)."
      deny: {}

Trade-offs:

ProsCons
Teams can keep infrastructure claims alongside application manifests in the same repository structureAdds runtime admission overhead to every resource deployment
Single Application per team instead of twoPolicy failures only surface during sync, not at the ArgoCD RBAC layer
Simpler repository organization for teamsAnother component to maintain and monitor
Requires tuning policy exceptions for legitimate platform components

I generally prefer the separate projects approach because it shifts enforcement left to the ArgoCD layer and maintain a clear separation of concern, but this alternative works if your teams strongly prefer unified manifests.

Environment-Driven Routing

Include an environment field in your XRDs (dev, staging, prod) that drives ProviderConfig selection. The composition uses this field to route resources into the correct cloud account. Teams declare what they need and where it should go, the platform handles the rest.

The following is an example of the resource your developer has to include in their repository to created a bucket named compliant-bucket in the production account:

NOTE

When using Argo CD, add the sync-wave annotation to ensure Argo CD syncs the resource after Crossplane is up and running. This is crucial to avoid sync errors during deployment.

yaml
apiVersion: platform.custom/v1alpha1
kind: PlatformBucket
metadata:
  name: compliant-bucket
  namespace: crossplane-system
  annotations:
    argocd.argoproj.io/sync-wave: "8"
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
spec:
  bucketName: compliant-bucket
  environment: prd

If you're familiar with Kubernetes manifests, this looks simple and straightforward. This is exactly what we're trying to achieve with this approach. Behind this claim sits a composition that enforces all your organization's security requirements and compliance checks. This composition is maintained by the platform team and lives in the platform cluster. Developers get a simple interface, the platform ensures every bucket meets production standards.

The Result

App teams get infrastructure on-demand without waiting on tickets or learning Terraform. The platform team maintains full control over what can be provisioned and how it's configured. Every resource request lives as a claim manifest in the management cluster, giving you a complete audit trail and inventory. No surprises, no drift, no shadow IT.

Argo CD Platform Cluster Pattern

central-argocd-pattern

A single Argo CD instance in the management cluster manages all downstream workload clusters. This differs from running Argo CD in each cluster, where you'd need to maintain multiple instances with duplicated configurations and credentials.

High Availability Setup

Since we're running a single Argo CD instance for all teams and clusters, strong fault tolerance is critical. Run Argo CD in HA mode with multiple replicas of the application controller, repo server, and API server. This ensures the platform stays available during node failures or upgrades.

The management cluster holds all root ApplicationSet and Application manifests for platform components. All cluster secrets and RBAC policies live in one place, making auditing and updates straightforward.

Control Plane Resilience

An important characteristic of this architecture is that Argo CD control plane downtime does not impact running workloads. If the Argo CD instance becomes unavailable, applications already deployed to workload clusters continue running normally. Only new deployments or synchronization of changes are blocked until Argo CD recovers.

To maximize this resilience, configure Argo CD to avoid using finalizers on Application resources where possible. By default, Argo CD adds a finalizer (resources-finalizer.argocd.argoproj.io) to Applications to ensure it can clean up resources when an Application is deleted. However, this means deleting an Application requires Argo CD to be available, which can block cleanup operations during an outage.

You can disable automatic finalizer injection via the application.resourceTrackingMethod setting:

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  application.resourceTrackingMethod: annotation+label
  # Prevent automatic finalizer addition
  application.instanceLabelKey: argocd.argoproj.io/instance

This approach uses annotations and labels for resource tracking instead of relying on finalizers for cleanup. The trade-off is that you lose automatic cascade deletion of resources when an Application is removed, but you gain the ability to manage Applications even when Argo CD is down. For production platforms, this is usually the right trade-off since you rarely want to delete all resources alongside an Application anyway.

Cluster Onboarding Approaches

I've built custom integrations for both Azure and AWS that automate cluster registration with Argo CD. There are two approaches: Automated Onboarding (Terraform applies the cluster secret directly) and Declarative Onboarding (External Secrets Operator reconciles cluster secrets from a secret manager).

Pick automated for speed, pick declarative for true GitOps reconciliation and automatic disaster recovery.

For a detailed breakdown of both patterns and implementation guides, see GitOps Cluster Onboarding Patterns.

Disaster Recovery

Both approaches make cluster recreation simple. The management cluster holds the complete inventory of downstream clusters and their configurations. Recreate workload clusters in a new region or account, and they'll automatically sync their workloads from the same ApplicationSets. No manual reconfiguration needed.