AWS Fargate: CI/CD Architecture for Drupal and PHP Frameworks

Submitted by nigel on Saturday 4th January 2020
Introduction

AWS Fargate is a compute engine for Amazon ECS which allows the running of Docker containers without having to provision and manage the underlying EC2 servers. This makes the standing up of Docker bundles immeasurably easier, and thankfully that ease comes with very few limitations. It is often considered the Docker equivalent of AWS Lambda. 

Fargate's ease of use and convenience makes it a natural fit for CI/CD architecture. Apps can be defined as Fargate tasks which are analogous to Docker Bundles, and pipeline stages can be engineered to perform standard activities such as build, automated testing, static analysis, dev playground, and deploy. 

This architecture statement explains how this can be applied to the world of PHP frameworks and Drupal CMS. With a little imagination, and a few tweaks to the ecosystem and Fargate tasks, and the architecture would be valid for other high level languages and frameworks.

Pipeline Stages
CI/CD Stages

Before we get into the physical AWS ecosystem, it's worth discussing what we are trying to achieve. I have identified the four main stages in the pipeline in the diagram above. There will be two entry points into the build stage: it will be either automated on a git pull request creation and approval, but also a manual invocation is possible for times when a dev or QA wants to check progress on a feature after a commit but before a PR is created. In PHP frameworks and Drupal 8+ codebases, it is customary to use the dependency management tool composer which obviates the need to check into vcs contributed plugins and modules. The build stage will also compile the front end themes and JavaScript and minify / uglify at the same - therefore production ready CSS and JS should not be committed into vcs by developers. Once the codebase has been built, it will be copied into a PHP/Apache Docker image and pushed into the AWS ECR repository for later use. 

The automated testing will cover both functional BDD and unit TDD testing against the codebase image that has just been built. Writing TDD against the largely procedural Drupal 7 was onerous and challenging and therefore overlooked in the main. Thankfully Drupal 8 is OOP which makes writing back-end tests with PHPUnit much more straightforward. 

The static analysis will check for code quality, vulnerability, smell, copy and paste detection, and basic lint type checking. 

The deploy step can mean different things to different people. Deployment can mean pushing your built codebase to the production server, or it can mean simply creating a semantic versioning tag, and pushing the codebase artefact to an upstream repository, ready for a manual deployment to prod by Ops staff. I have opted for the latter interpretation in this instance.

CI/CD Ecosystem
Ecosystem

The physical ecosystem is described below and shown in diagrammatic form above. Whilst on first glance this may appear quite complex, in fact it is largely self-evident with a little explanation. 

Jenkins

Jenkins will be the automation tool since it is ubiquitous across the industry and there is a great deal of expertise in the market place. Jenkins has pipeline capability, and new projects should elect to use the Groovy-based declarative scripting DSL. This allows stages to run in parallel, and thus once the build stage is completed, automated testing and static analysis can be concurrent. In addition I propose the standing up of a 'persisting playground' - the built feature branch of the web app with a DNS qualified relevant url which can be shared between devs, QA, and stakeholders. This would be useful for troubleshooting, demonstrations of progress, show and tells etc., and would be torn down at close of business to reduce costs. My tutorial here shows how a friendly branch specific url can be created and propagated by AWS Route 53 within 60 seconds. 

The Jenkins pipeline script is responsible for control flow only; all the actual 'doing' activities, such as building the codebase Docker image on the fly and committing it to ECR, is undertaken by shell scripts spawned from the pipeline. This means the power of AWS CLI can be leveraged to undertake all AWS specific tasks. 

The flow of the pipeline will vary depending upon the digital team's needs. I have elected for the following:

  • On PR Creation, run build, automated testing, static analysis, persisting playground
  • On PR Approval, run build, automation testing (and fail the pipeline if testing fails), deploy

Obviously you may have different requirements here such as building on each feature branch commit, but the re-configuration effort will be minimal regardless. Adding conditions on the Jenkins pipeline stages is trivial. 

AWS ECR

The PHP-Apache Docker image is a starting point for creating an image that will include the feature branch built by Jenkins. Detailed instructions on how to achieve this are covered in my earlier blog, and once the image is created it is committed to ECR. When a Fargate task is run, it must first be provisioned by fetching all the Docker images it needs, and having the codebase + Apache/PHP local to AWS will improve start-up. 

AWS Fargate

Jenkins will run up to three concurrent AWS Fargate tasks - a task which includes all the Docker containers to run the automated tests, a task that has the static analysis containers, and the basic web app containers for the persisting playground which will contain Apache+PHP along with probably Memcached or Redis, and any other products and services required by your web app. More on the AWS Fargate task definitions later. 

The automated tests and the static analysis run to conclusion and automatically die once their entry point scripts have run. The persisting playground will run Apache in the foreground to prevent termination.

AWS RDS

It is impractical to import db dumps into Fargate tasks, so all Fargate tasks will use one shared database resource. Obviously sharing a database can feasibly lead to schema inconsistencies when different schema changes are being developed on different feature branches. In Drupal these are implemented by drush updatedb and in PHP frameworks there is always specific command line tooling to perform database migrations. To mitigate against schema inconsistencies, there would need to be communication between developers, and the facility to reset the database to a known develop branch datum with a Jenkins job. 

AWS S3

An export of the develop branch database should be held locally in AWS S3 to facilitate quick import into RDS as required, and updated nightly. 

AWS S3 & AWS Cloudfront

File assets such as jpg and png images are normally held in /sites/{default|domain name}/files in Drupal but are not committed to vcs. These need to be synced from the host's server nightly into S3, and for greater performance the use of Cloudfront acceleration is recommended. Configuration changes will be required in the Drupal or PHP framework codebase to point to the Cloudfront endpoints. Drupal will require the use of the Storage API module or the S3 File System module. 

AWS CloudWatch

Fargate integrates with CloudWatch easily, and output from each Docker container in the Fargate task definition will send stdout and stderr to a CloudWatch log group. This is imperative not just for troubleshooting, but also to determine the automated test results. By parsing the logs, a full test output can be created. 

Slack Integration

It is desirable for the CI/CD pipeline to be hands-free, i.e. Jenkins login not required unless a manual pipeline invocation is necessary. To achieve this goal, Slack integration is required with job statuses being sent to dedicated Slack channels, and output from the test results also sent through Slack. 

Existing Host

Of course to seed RDS and the file asset S3, we need to get database exports and assets from our existing host. Exactly how to achieve this will vary dependent upon the host. If possible, a slave Jenkins can execute remote jobs on the host to upload the db exports and file assets to RDS and S3 respectively, and that is how I have drawn my diagram. If this is not possible, a combination of scp and rsync will suffice. 

Fargate Task Definitions
Fargate Docker Containers

Fargate tasks can be defined in the AWS Console, but I've already identified that AWS CLI is the tool of choice for programmatic creation. The CLI requires the container definitions in JSON format, and for the pipeline discussed above there would be three Fargate definitions per feature branch. The first is the standard web app which would typically have the PHP/Apache container with the codebase copied into it during a docker build command. The second is the automated testing stack which includes Maven (which has the Java runtime bundled) along with Selenium grid and two nodes: Firefox and Chrome. I spoke about this Fargate task at length in an earlier blog here. The third definition is for static analysis and I am suggesting the excellent jakzal/phpqa repository for this. 

The three tasks would be defined and then run within the Jenkins pipeline. It must be noted that once the tasks have been run they should be immediately deleted from AWS ECR. Storage in ECR is costed, and in the unlikely event of the tasks being required at a later date, they can always be recreated by invoking the pipeline again manually and selecting the feature branch from a drop down list. 

The use of Drupal Drush

Fargate is a black box, without access to the underlying EC2 infrastructure, and therefore it is not possible to ssh into EC2 and then perform a docker exec command. Furthermore drush aliases require an ssh daemon running on the target alias to be able to execute remote commands from say a developer working on premises. So how can we run the normal set of everyday drush commands such as to apply schema changes, apply features, split the configuration, flush the caches etc?

Ok so we have PHP installed on Jenkins since we have to run composer install in the build stage. We also have an RDS instance with the database. We therefore have what we need to bootstrap Drupal - which is a requirement of drush. 

Therefore all drush commands should be run on Jenkins during the build stage. These commands should be listed in a sibling directory to the codebase's docroot or web directory, in say devops_hooks or some such. As part of the Jenkins build stage, the existence of directory devops_hooks will be checked, and all shell scripts within it will be run.  

 

 

blog terms
AWS AWS CLI devops PHP Drupal