Ephemeral Environments for DevOps: Why, What and How?

In the previous post I discussed the issues with having static environments. In this post I cover one of the solutions for those issues: Ephemeral environments.

Ephemeral environments are also sometimes called “Dynamic environments”, “Temporary environments”, “on-demand environments” or “short-lived environments”.

The idea is that instead of environments “hanging around” waiting for someone to use them, the CI/CD pipeline stages are responsible for instantiating and destroying the environments they will run against.

For example, we might have a pipeline with the following stages:

  1. Build

  2. Test:Integration&Quality

  3. Test:functional

  4. Test:Load&Security

  5. Approval

  6. Deploy:Prod

(Image courtesy of cloudbees)

(Image courtesy of cloudbees)

In a traditional static environment configuration, each stage (perhaps except the build stage) would be configured to run again a static environment that is already built and is waiting for it, or some steps might share the same environment, which causes all the issues I mentioned previously.

In an ephemeral environment configuration, each relevant stage would contain two extra actions: one at the beginning, and one at the end, that spin up an environment for the purpose of testing, and spin it down at the end of the stage. 

The first step(1) is to compile and run fast unit tests, followed by putting the binaries in s aspecial binary repository such as artifactory.

There is also an stage (2) that creates a pre-baked environment as a set of AMIs of VM images (or containers) to later be instantiated. :

  1. Build & Unit Test

    1. Build Binaries, run unit tests

    2. Save binaries to artifact managemnt

  2. Pre-Bake Staging Environment

    1. Instantiate Base AMIs

    2. Provision OS/Middleware components

    3. Provision/Install application

    4. Save AMIs for later instantiation as STAGING environment (in places such as S3, artifactory etc.)

  3. Test:Integration&Quality

    1. Spin up staging environment

    2. Run tests

    3. Spin down staging environment

  4. Test:functional

    1. Spin up staging environment

    2. Run tests

    3. Spin down staging environment

  5. Test:Load&Security

    1. Spin up staging environment

    2. Run tests

    3. Spin down staging environment

  6. Approval

    1. Spin up staging environment

    2. Run approval tests/wait for approval and provide a link to the environment for humans to look into the environment

    3. Spin down staging environment

  7. Deploy:Prod

    1. Spin up staging environment

    2. Data replication

    3. Switch DNS from old production to new environment

    4. Spin down old prod environment (this is a very simplistic solution)

A few notes:

·       Pre-Baked Environment:

Notice that the environment we are spinning up and spinning down is always the same environment, and it is a staging environment with the application pre-loaded on top of it.

 

Staging environments are designed to look exactly like production (in fact, in this case, we are using staging as a production environment in the final stage).

The reason we are always using the same environment template, is because:

  • This provides environmental consistency between all tests and stages, and removes any false positives or negatives. If something works or doesn’t work, it is likely to have the same effect in production.

  • Environments are pre-installed with the application, which means that we are always testing the exact same artifacts, so we get artifact consistency

  • Because environments are pre-installed, we are also in-explicitly testing the installation/deployment scripts of the application.

 Only one Install.

Also notice that there is no “installation”  after the pre-baking stage. – which means we also don’t “deploy” into production. We simply “instantiate a new production in parallel”.

We “install-once, promote many” which means we get Installation consistency across the stages.

Blue-Green Deployment

Deploying to production just means we instantiate a new pre baked environment in the production zone (for example a special VPC if we are dealing in AWS) which would run in parallel with the “real” production. Then we slowly soak up production data, let the two systems run in parallel, and eventually either switch a DNS to the new server, or slowly drain the production load balancer into the new server (there are other approaches to this that are beyond the scope of this article.

Speed

Another advantage of this set up is that because each stage can have its own environment, we can run some stages in parallel, so, in this case we can run all the various tests in parallel, which will save us valuable time:

  1. Build & Unit Test

    1. Build Binaries, run unit tests

    2. Save binaries to artifact managemnt

  2. Pre-Bake Staging Environment

    1. Instantiate Base AMIs

    2. Provision OS/Middleware components

    3. Provision/Install application

    4. Save AMIs for later instantiation as STAGING environment (in places such as S3, artifactory etc.)

  3. HAPPENING IN PARALLEL:

    1. Test:Integration&Quality

      1. Spin up staging environment

      2. Run tests

      3. Spin down staging environment

    2. Test:functional

      1. Spin up staging environment

      2. Run tests

      3. Spin down staging environment

    3. Test:Load&Security

      1. Spin up staging environment

      2. Run tests

      3. Spin down staging environment

  4. Approval

    1. Spin up staging environment

    2. Run approval tests/wait for approval and provide a link to the environment for humans to look into the environment

    3. Spin down staging environment

  5. Deploy:Prod

    1. Spin up staging environment

    2. Data replication

    3. Switch DNS from old production to new environment

    4. Spin down old prod environment (this is a very simplistic solution)

How?

One tool to look into for managing environments and also killing them easily later would be Chef:Provision, which can be invoked from the jenkins command line, but also saves the state later for spinning down an environment. It also follows the toolchain values we discussed before on this blog.

In the docker world, given a pre-baked set of docker images, we can use kubernetes to create ephemeral environments very easily, and destroy them at will.

Jenkins-X would be a good tool to look into specifically for those types of environments.