Traefik Proxy in AWS with AWS ECS - Part 2 ¶
TL;DR ¶
I believe that ECS Anywhere is the natural evolution to docker swarm for AWS applications on-premise. Using Traefik with ECS Anywhere to deploy applications, might just make you feel like you are on AWS. And ECS Compose-X makes the whole process even easier.
Introduction ¶
In the previous lab article, we went over deploying Traefik to AWS ECS and running it in AWS Fargate, behind a Network Load Balancer (NLB). We used AWS Certificate Manager (ACM) to provision and terminate SSL connectivity with clients.
This time, we are going to deploy Traefik into AWS ECS Anywhere and use the TLS encryption management done by Traefik with Let’s Encrypt in order to provision certificates. With a backup!
What is AWS ECS Anywhere ? ¶
AWS ECS is the control plane service that allows you to create your Task Definitions, Service definitions, scheduled tasks, etc., and deploy these into clusters.
The clusters rely on either a `Capacity Provider`_ such as AWS Fargate or AWS EC2 to provide the IaaS layers (compute, networking, storage ,etc.) to run your containers into.
AWS ECS Anywhere is an extension of the ECS Capacity Providers which allows you to register on-premise ECS Instances. These can be bare-metal machines, Virtual Machines, or using AWS Outposts . You only pay per running ECS instance, per month, but can deploy an unlimited set of containers (within your hardware capacity limits).
How does it work ? ¶
Without going into details, as many already covered this very well in other blog posts . In a nutshell, your machine (virtual or physical) is registered in AWS SSM, then your machine is registered into an AWS Cluster as an ECS Instance.
Later we will go over a brief checklist of things to have in place to provide remote access as here Traefik will be the entry point into other services.
Attention
As mentioned earlier, the installation of the necessary software and tools is out of the scope of this article. Proceeding forward, we assume that you have already gotten one or more working ECS Anywhere instances running.
The architecture ¶
 
         The architecture is rather straightforward. We have an ECS Instance running on-premise, for me, on a RaspberryPi 4. We installed docker onto it and ran the ECS Anywhere install scripts, and we are ready to go
If you list your container instances, you can now see yours. Here is mine
aws ecs list-container-instances --cluster ANewCluster
{
    "containerInstanceArns": [
        "arn:aws:ecs:eu-west-1:373709687835:container-instance/ANewCluster/dfc804e50f7f445f9fbe3fae775997a6"
    ]
}
          Note
If you plan to have ECS Instances running with an ARM processor, such as a RaspberryPi, make sure that the image you will use is either a list that contains ARM images, or that the image you pick was built for it.
Services deployment ¶
The basics ¶
           Using the docker-compose file from part 1, we make a few adjustments, most importantly, we set a deploy label
           
            
             ecs.compute.platform
            
           
           to
           
            
             EXTERNAL
            
           
           . As a result, ECS Compose-X will set all the right properties and values
for this ECS Service to run with ECS Anywhere. Most importantly, it enforces in the configuration, the
           
            considerations when using ECS Anywhere
           
          
           We also add to the
           
            
             command
            
           
           list, the Traefik option to use the ECSAnywhere provider.
          
- "--providers.ecs.ecsAnywhere=true"
           In order to allow for that to work, we also need to add a few IAM permissions granted to Traefik to perform the services discovery.
           In our
           
            
             ecs-anywhere.yaml
            
           
           extension file, we add policies to allow for
          
- 
            SSM discovery - allows to find the ECS Instance details (IP Address etc.) 
- 
            Route53 - that will allow Traefik to use Route53 for certificates domain based validation. 
    x-iam:
      Policies:
        - PolicyName: SSMDiscover
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action:
                  - "ssm:DescribeInstanceInformation"
                Resource:
                  - "*"
        - PolicyName: Route53UpdatesForACME
          PolicyDocument: {
            "Version": "2012-10-17",
            "Statement": [
              {
                "Sid": "",
                "Effect": "Allow",
                "Action": [
                  "route53:GetChange",
                  "route53:ChangeResourceRecordSets",
                  "route53:ListResourceRecordSets"
                ],
                "Resource": [
                  "arn:aws:route53:::hostedzone/*",
                  "arn:aws:route53:::change/*"
                ]
              },
              {
                "Sid": "",
                "Effect": "Allow",
                "Action": "route53:ListHostedZonesByName",
                "Resource": "*"
              }
            ]
          }
        - PolicyName: TraefikRecommended
          PolicyDocument: {
            "Version": "2012-10-17",
            "Statement": [
              {
                "Sid": "TraefikECSReadAccess",
                "Effect": "Allow",
                "Action": [
                  "ecs:ListClusters",
                  "ecs:DescribeClusters",
                  "ecs:ListTasks",
                  "ecs:DescribeTasks",
                  "ecs:DescribeContainerInstances",
                  "ecs:ListContainerInstances",
                  "ecs:DescribeTaskDefinition",
                  "ec2:DescribeInstances"
                ],
                "Resource": [
                  "*"
                ]
              }
            ]
          }
  whoami:
           Note
If you’d prefer to use HTTPs/TLS based certificate validations, change the settings, but you will need to publicly expose your application to the internet.
And that’s it, you can now deploy Traefik with ECS Anywhere service discovery! From Part 1, along with TLS definitions, that’s very much the only difference/change that we have made. Of course, on-premise, we do not have a NLB either.
# Change the Cluster name to yours
CLUSTER_NAME=ANewCluster ecs-compose-x up -d templates -f docker-compose.yaml -f ecs-anywhere.yaml
           Backup and restore the SSL Certificates ¶
Why do we need to do that? ¶
The reason we want to backup that file, is so that Traefik on restart finds the file and does not make requests to Let’s Encrypt over and over again, which could go above the quota/rate limit, and it wouldn’t be able to retrieve the certificates anymore.
This step is not strictly necessary, but if you are using ECS Anywhere on a local hard drive, or on AWS Fargate, the SSL certificates file is not stored on a persistent/shared file system. And we want to avoid that as much as possible.
And unless you have configured a distributed/network file system such as NFS or AWS Elastic File System , you will need to implement the logic for persistent backup.
Implementation Logic ¶
 
           We are going to create 2 sidecars along with our Traefik container:
- 
             the first one will attempt to restore/set configuration and the SSL certificates file before Traefik starts. 
- 
             the second one will be running alongside Traefik and watch for changes to the acme.jsonfile.
            The very first time, without the
            
             
              acme.json
             
            
            file stored in S3, there won’t be anything to restore. So Traefik will
create the
            
             
              acme.json
             
            
            file. The watchdog container will pick the file up when changes occur, and store to S3.
In the event that Traefik has to generate a new certificate, the watchdog will pick it up again, create a backup of
the previous file, and store the new one.
           
            The next time Traefik restarts (configuration change, failure of any kind, etc.), the
            
             
              traefik-restore
             
            
            container will find
the
            
             
              acme.json
             
            
            file in S3, and write it on the shared docker volume with Traefik. That way, Traefik already has
all the previous certificates it generated, and won’t generate them again. Saving yourself from the Let’s Encrypt
rate limiting.
           
Chronologically we have
- 
             traefik-restorecontainer starts, pulls and writes ouracme.jsoncontainer. If successful, stops and exits with success.
- 
             traefikcontainer starts, loads up configuration, and theacme.jsoncertificates file.
- 
             traefik-backupstarts, and looks for any changes to theacme.jsoncertificate file, and uploads to S3 upon changes to the file.
Implementation with Compose ¶
We create another extension file (but you could also make the changes in the existing ones!). We add two more volumes:
- 
             one shared to all 3 containers, that’s where our acme.jsonfile will be stored.
- 
             one only shared between the restore and backup container 
            The
            
             
              traefik-restore
             
            
            container not only will restore the
            
             
              acme.json
             
            
            file, but it will also be giving instructions to the
backup container on how to perform the backup.
           
Now, let’s look at these two containers more closely.
            Let’s look at
            
             
              traefik-restore
             
            
            first
           
  traefik-restore:
    image: public.ecr.aws/compose-x/ecs-files-composer:nightly
    volumes:
      - traefik_certs:/traefik_certs:rw
      - backup_job:/config:rw
    deploy:
      labels:
        ecs.task.family: traefik
        ecs.depends.condition: SUCCESS
    environment:
      <<: *ssl-s3-config
    command:
      - --from-ssm
      - x-ssm_parameter::cert-backup-restore::ParameterName
      - --context
      - jinja2
      - --decode-base64
            
            To instruct on what to do, we have the file
            
             
              restore_backup.yaml
             
            
            which is stored (base64 encoded) into a SSM parameter,
used by
            
             
              files-composer
             
            
            . In the
            
             
              command
             
            
            section, we indicated that container to use the SSM parameter defined
in
            
             x-ssm
            
           
x-ssm_parameter:
  cert-backup-restore:
    Properties:
      DataType: text
      Type: String
      Tier: Intelligent-Tiering
    MacroParameters:
      FromFile: restore_backup.yaml
      EncodeToBase64: true
    Services:
      traefik:
        Access: RO
            
            We mount the volumes that will restore the
            
             
              acme.json
             
            
            file for Traefik, and the volume for the backup container
to know which files to look for, and where to store them in S3.
           
            Next, we have
            
             
              traefik-backup
             
            
            , which shares a volume with our
            
             
              traefik
             
            
            container, to pick up on the file changes
made to
            
             
              acme.json
             
            
            .
           
  traefik-backup:
    image: public.ecr.aws/compose-x/s3-autosync:nightly
    volumes:
      - traefik_certs:/traefik_certs:ro
      - backup_job:/config:rw
    depends_on:
      - traefik
    deploy:
      labels:
        ecs.task.family: traefik
      resources:
        limits:
          cpus: 0.1
          memory: 64MB
        reservations:
          cpus: 0.1
          memory: 64MB
    command:
      - -f
      - /config/traefik.yaml
      - --debug
    stop_grace_period: 1m
    environment:
      <<: *ssl-s3-config
            We’ve added x-s3 to lookup the existing S3 bucket that we want to use to store our files into. The Traefik task IAM role will be granted RW access to the objects, and read-only on the bucket.
To deploy we simply now run
# Change the Cluster name to yours
CLUSTER_NAME=ANewCluster ecs-compose-x up -d templates -f docker-compose.yaml -f ecs-anywhere.yaml -f backup.yaml
            Summary ¶
We have now gotten ourselves a very easy way to deploy services using ECS Anywhere into our on-premise hardware. But we also now have done it, with only a few changes compared to deploying the services to AWS Fargate.
ECS Compose-X automatically set all of the necessary options and properties to make our service run in either environment, simply by changing configuration & labels.
And with Traefik having so many great features, and ECS Anywhere Services discovery, we can add services and applications, and Traefik will automatically do the necessary to make it all work.
Note
The example files in this repository are near identical to the configuration I have for my own home lab, and have since been able to add further services managed with Traefik simply by setting my labels correctly.
See also
This has been delayed for a long time waiting on PRs to be merged to fix the ECS Anywhere integration with Traefik. Thanks to the Traefik team and tuxpower for working on this feature.