Skip to content

Working with AWS ECR: Push Commands, Pull-Through Cache, and Troubleshooting

AWS Elastic Container Registry (ECR) is my go-to for managing container images in AWS environments. While it's straightforward for basic push/pull operations, there are some advanced features and troubleshooting techniques that can save you time. In this post, I'll cover the essential workflows and some debugging techniques I use when things go wrong.

Basic ECR Workflow

When you create a new ECR repository, AWS provides push commands directly in the console (click View Push Commands). Here's the standard workflow for pushing images to ECR.

Authenticate with ECR

First, authenticate your Docker client with the ECR registry using AWS credentials:

bash
aws ecr get-login-password --region eu-west-1 --profile your-profile \
  | docker login --username AWS --password-stdin 123456789.dkr.ecr.eu-west-1.amazonaws.com

The authentication token is valid for 12 hours. If you're working across multiple regions or accounts, you'll need to authenticate with each registry separately.

Build and Tag Your Image

Build your Docker image locally:

bash
docker build -t my-app .

Then tag it with the ECR repository URL:

bash
docker tag my-app:latest 123456789.dkr.ecr.eu-west-1.amazonaws.com/my-app:latest

Push to ECR

Finally, push the tagged image to your ECR repository:

bash
docker push 123456789.dkr.ecr.eu-west-1.amazonaws.com/my-app:latest

Repository Creation Templates for Pull-Through Cache

One of the more powerful ECR features is pull-through cache. This lets ECR act as a caching proxy for public registries like Docker Hub, Quay, or GitHub Container Registry. Instead of hitting public registry rate limits, your cluster pulls through ECR, which caches the images.

The key to making this work at scale is using Repository Creation Templates. These templates automatically create ECR repositories when images are pulled for the first time, so you don't have to manually set up each one.

Here's a Terraform example that configures a template for pull-through cache repositories:

hcl
resource "aws_ecr_repository_creation_template" "pull_through_cache" {
  prefix               = "ROOT"
  description          = "Main pull-through cache template"
  image_tag_mutability = "IMMUTABLE"

  applied_for = [
    "PULL_THROUGH_CACHE",
  ]

  encryption_configuration {
    encryption_type = "AES256"
  }

  lifecycle_policy = <<EOT
{
  "rules": [
    {
      "rulePriority": 1,
      "description": "Expire untagged images older than 14 days",
      "selection": {
        "tagStatus": "untagged",
        "countType": "sinceImagePushed",
        "countUnit": "days",
        "countNumber": 14
      },
      "action": {
        "type": "expire"
      }
    }
  ]
}
EOT
}

This template applies to all pull-through cache repositories, enforces immutable tags, and automatically cleans up untagged images after 14 days to keep storage costs down.

Troubleshooting ECR Pull Issues

When Kubernetes nodes fail to pull images from ECR, it's usually one of three things: network connectivity, authentication, or missing images. Here's how I debug each scenario.

Verify Network Connectivity

First, check if the node can reach the ECR registry at all:

bash
REG=eu-central-1
ACCT=123456789
REGISTRY="$ACCT.dkr.ecr.$REG.amazonaws.com"

# Expect "401 Unauthorized" if reachable (that's a good sign)
curl -sI "https://$REGISTRY/v2/" | head -n1

If you get a 401 Unauthorized, the registry is reachable. If you get timeouts or connection errors, it's a network problem (security groups, VPC endpoints, etc.).

Test Pull with Containerd

To test image pulling directly using containerd (which is what Kubernetes uses), run:

bash
REG=eu-central-1
ACCT=123456789
IMG="quay/argoproj/argocd:v3.0.0"
PASS="$(aws ecr get-login-password --region $REG)"

sudo ctr -n k8s.io images pull --user "AWS:${PASS}" \
  "$ACCT.dkr.ecr.$REG.amazonaws.com/$IMG"

This bypasses Kubernetes and Docker entirely, testing the pull at the containerd level. If this works but Kubernetes pods still fail, the issue is likely with the image pull secret or IRSA (IAM Roles for Service Accounts) configuration.

Check Pull-Through Cache Configuration

If you're using pull-through cache, verify the rules are configured correctly:

bash
aws ecr describe-pull-through-cache-rules --region eu-central-1 \
  --query 'pullThroughCacheRules[].{prefix:ecrRepositoryPrefix,upstream:upstreamRegistryUrl,status:registryId}'

This shows all configured upstream registries and their prefixes. Make sure your image path matches the prefix exactly (e.g., quay/argoproj/argocd for a quay prefix).

Verify Tag Exists in ECR

Sometimes the image exists upstream but hasn't been cached yet, or the tag doesn't exist at all. Check if a specific tag exists in ECR:

bash
REG=eu-central-1
ACCT=123456789
PREFIX=quay                             # must match your pull-through cache rule
REPO=argoproj/argocd                    # case-sensitive
TAG=v2.10.13                            # use a tag you know exists
PASS=$(aws ecr get-login-password --region $REG)

curl -sI -u "AWS:$PASS" \
  -H 'Accept: application/vnd.docker.distribution.manifest.v2+json' \
  "https://$ACCT.dkr.ecr.$REG.amazonaws.com/$PREFIX/$REPO/manifests/$TAG" | head -1

Expect HTTP/2 200 if the tag exists in ECR. If you get HTTP/2 404, the image hasn't been pulled through yet or the tag doesn't exist upstream.

Key Takeaways

ECR is more than just a Docker registry. The pull-through cache feature with repository creation templates eliminates manual repository management and protects you from public registry rate limits. When things break, the troubleshooting commands above let you quickly isolate whether the issue is network, authentication, or a missing image.

If you're running EKS at scale, using ECR with pull-through cache and proper lifecycle policies is a no-brainer. It reduces external dependencies and keeps your image storage costs predictable.