Working with AWS ECR: Push Commands, Pull-Through Cache, and Troubleshooting
AWS Elastic Container Registry (ECR) is my go-to for managing container images in AWS environments. While it's straightforward for basic push/pull operations, there are some advanced features and troubleshooting techniques that can save you time. In this post, I'll cover the essential workflows and some debugging techniques I use when things go wrong.
Basic ECR Workflow
When you create a new ECR repository, AWS provides push commands directly in the console (click View Push Commands). Here's the standard workflow for pushing images to ECR.
Authenticate with ECR
First, authenticate your Docker client with the ECR registry using AWS credentials:
aws ecr get-login-password --region eu-west-1 --profile your-profile \
| docker login --username AWS --password-stdin 123456789.dkr.ecr.eu-west-1.amazonaws.comThe authentication token is valid for 12 hours. If you're working across multiple regions or accounts, you'll need to authenticate with each registry separately.
Build and Tag Your Image
Build your Docker image locally:
docker build -t my-app .Then tag it with the ECR repository URL:
docker tag my-app:latest 123456789.dkr.ecr.eu-west-1.amazonaws.com/my-app:latestPush to ECR
Finally, push the tagged image to your ECR repository:
docker push 123456789.dkr.ecr.eu-west-1.amazonaws.com/my-app:latestRepository Creation Templates for Pull-Through Cache
One of the more powerful ECR features is pull-through cache. This lets ECR act as a caching proxy for public registries like Docker Hub, Quay, or GitHub Container Registry. Instead of hitting public registry rate limits, your cluster pulls through ECR, which caches the images.
The key to making this work at scale is using Repository Creation Templates. These templates automatically create ECR repositories when images are pulled for the first time, so you don't have to manually set up each one.
Here's a Terraform example that configures a template for pull-through cache repositories:
resource "aws_ecr_repository_creation_template" "pull_through_cache" {
prefix = "ROOT"
description = "Main pull-through cache template"
image_tag_mutability = "IMMUTABLE"
applied_for = [
"PULL_THROUGH_CACHE",
]
encryption_configuration {
encryption_type = "AES256"
}
lifecycle_policy = <<EOT
{
"rules": [
{
"rulePriority": 1,
"description": "Expire untagged images older than 14 days",
"selection": {
"tagStatus": "untagged",
"countType": "sinceImagePushed",
"countUnit": "days",
"countNumber": 14
},
"action": {
"type": "expire"
}
}
]
}
EOT
}This template applies to all pull-through cache repositories, enforces immutable tags, and automatically cleans up untagged images after 14 days to keep storage costs down.
Troubleshooting ECR Pull Issues
When Kubernetes nodes fail to pull images from ECR, it's usually one of three things: network connectivity, authentication, or missing images. Here's how I debug each scenario.
Verify Network Connectivity
First, check if the node can reach the ECR registry at all:
REG=eu-central-1
ACCT=123456789
REGISTRY="$ACCT.dkr.ecr.$REG.amazonaws.com"
# Expect "401 Unauthorized" if reachable (that's a good sign)
curl -sI "https://$REGISTRY/v2/" | head -n1If you get a 401 Unauthorized, the registry is reachable. If you get timeouts or connection errors, it's a network problem (security groups, VPC endpoints, etc.).
Test Pull with Containerd
To test image pulling directly using containerd (which is what Kubernetes uses), run:
REG=eu-central-1
ACCT=123456789
IMG="quay/argoproj/argocd:v3.0.0"
PASS="$(aws ecr get-login-password --region $REG)"
sudo ctr -n k8s.io images pull --user "AWS:${PASS}" \
"$ACCT.dkr.ecr.$REG.amazonaws.com/$IMG"This bypasses Kubernetes and Docker entirely, testing the pull at the containerd level. If this works but Kubernetes pods still fail, the issue is likely with the image pull secret or IRSA (IAM Roles for Service Accounts) configuration.
Check Pull-Through Cache Configuration
If you're using pull-through cache, verify the rules are configured correctly:
aws ecr describe-pull-through-cache-rules --region eu-central-1 \
--query 'pullThroughCacheRules[].{prefix:ecrRepositoryPrefix,upstream:upstreamRegistryUrl,status:registryId}'This shows all configured upstream registries and their prefixes. Make sure your image path matches the prefix exactly (e.g., quay/argoproj/argocd for a quay prefix).
Verify Tag Exists in ECR
Sometimes the image exists upstream but hasn't been cached yet, or the tag doesn't exist at all. Check if a specific tag exists in ECR:
REG=eu-central-1
ACCT=123456789
PREFIX=quay # must match your pull-through cache rule
REPO=argoproj/argocd # case-sensitive
TAG=v2.10.13 # use a tag you know exists
PASS=$(aws ecr get-login-password --region $REG)
curl -sI -u "AWS:$PASS" \
-H 'Accept: application/vnd.docker.distribution.manifest.v2+json' \
"https://$ACCT.dkr.ecr.$REG.amazonaws.com/$PREFIX/$REPO/manifests/$TAG" | head -1Expect HTTP/2 200 if the tag exists in ECR. If you get HTTP/2 404, the image hasn't been pulled through yet or the tag doesn't exist upstream.
Key Takeaways
ECR is more than just a Docker registry. The pull-through cache feature with repository creation templates eliminates manual repository management and protects you from public registry rate limits. When things break, the troubleshooting commands above let you quickly isolate whether the issue is network, authentication, or a missing image.
If you're running EKS at scale, using ECR with pull-through cache and proper lifecycle policies is a no-brainer. It reduces external dependencies and keeps your image storage costs predictable.
