These days everywhere I go people seem to talk about wanting to deploy to production more often. It seemed like an endless battle trying to deploy more often as the amount of code to deploy seemed to bulk up in big batches, while we were rambling about buzzwords such as “CI/CD” and “feature toggles“ without actually knowing how to get it implemented.
I started doing research on how other companies are able to deploy to production several times a day and my research yielded a solution, that set my client up for implementing continuous delivery (CD). I call this solution the five levels of continuous delivery.
The five levels of continuous delivery
As I was researching and implementing continuous delivery I came up with what I call “The five levels of continuous delivery”, which contains the five important aspects of getting CD implemented:
- A development process that supports and encourages CD
- Automatisation of build, test, end to end testing and automatic deployment of feature environments.
- Use release toggles for independent deployments
- Get notified about production problems before your users tell you
- Automatic and easy fallback by integrating feature toggle and monitoring
These levels are in increasing order where 1 is the first to implement and 6 is the last to implement. You don’t need to be at level 6 to be doing continuous delivery, but as you go up the levels the setup becomes more automated, safe and flexible; easing performing the CD process.
Level 1: Development process that supports and encourages CD
Previously when I have been working in teams wanting to implement CD the focus has been all about automatization of build, test, logging, and monitoring. Though this is definitely an enabler for CD, it is not what makes teams deploy multiple times a day; it just makes it easier to deploy often and streamlines the deployment and operations process. What makes teams deploy several times a day then? Changing the way you work to encourage shipping code you just wrote directly to production with confidence.
The teams I have been working with have almost always used scrum and have had a Scrum board like this:
|To Do||In Progress||In Review||Test||Done|
Looking at a board like this it was no wonder that they didn’t deploy every day, hell not even every sprint. It simply doesn’t work as this board encourages a development process where you stage a lot of changes and done means “have been tested but might have broken when we deploy in 2 weeks, so will need retesting at that time”. This is bad as the feature should just have been shipped immediately to production as it had all the requirements in check: Code has been reviewed, feature has been QA’ed and the feature can be shipped independently (not waiting for some external dependency like other feature or backend service endpoint, in that case, we need to have level 3: feature toggle system in place).
A scrum development process that encourages CD would look like this:
|To Do||In Progress||In Review||Deployed|
This board promotes CD because features are shipped immediately as they have been reviewed instead of making them rotten in a staging environment.
This means that a developer should merge the change into the production branch as soon as it has been through QA.
Level 2: Automatisation of build, test, end to end testing and automatic deployment of feature environments
This is the CI (continuous integration) in CI CD, which involves automating the processes around build, test and integrate code together as well as automatic deployment to a feature environment, serving the feature for QA in a production-like environment.
Branching and how it’s different from GitFlow
Previously I have worked with setups that follow the GitFlow way of working, which contains a dev environment, staging environment, release environments, pre-prod environment, and production. The more environments you have, the more you are queueing up = deploying less often.
In contrast to GitFlow, the CI process follows a process with only feature branches and a master branch and where the master branch means production. This means every time code is pushed to the master branch the pipeline should deploy it to production. For this reason, it is a good idea to protect the master branch with only allowing master updates through pull request containing successful CI and approval from tech reviewer and QA.
When a new feature is created, it is created with the feature/ prefix and after development, it should request a pull request to master.
Feature environments instead of big staging environments
Traditionally with the “deploy once a month teams” there has been a staging environment with a BIG delta from the last deployment, making the deployment riskier and testing becomes more work. There were so much to test so the QA didn’t have the time of regression testing all the existing features on every deployment to the staging environment causing the risk that some feature might have broken other features without QA noticing. The root problem here is the big staging environment.
Instead, I like to use a small staging environment for every feature called a feature environment. WIth a feature environment the only delta is the newly created feature and if the feature is successfully QA’ed, the feature can be shipped to production.
The CI CD pipeline should simply listen for branches prefixed with feature/ and deploy these to the dev hosting server and notify QA that there is a new feature to test.
These feature environments also enable automatic end to end tests to be run as part of a pull request prerequisite automating, ensuring that the feature actually runs well on the server even before QA starts verifying the feature. Also, it can be set up to mail servers or process systems like JIRA to automatically provide QA with a link to where the feature environment is hosted and the QA can start testing.
The build server setup
The build server should have two configurations: one for the master branch and one for feature branches:
Master branch build server config: The build server should listen for changes to the master branch and build, test and trigger the deployment server for deploying the new code to production.
Feature branches build server config: The build server should listen for feature branches and building, testing and trigger the deployment server for deploying the feature to the feature environment.
The deployment server setup
The deployment server should have two configurations one for production and one for feature branches:
Prod: The production deployment config should set up the server and deploy the built code from the build server to the production server.
Feature: The feature deployment config should setup the feature environment and deploy the new code to the feature environment. A good URL for the feature environment could be https://devserver/featurebranchname.
QA happens in the pull request
The code is moved to production by creating a pull request to master branch. In the pull request, the developer writes a description for the tech reviewer as well one for QA, providing the QA with a link and test guides to the feature environment. If both tech and QA approves the pull request can be merged to the production branch and deployed automatically.
Level 3: Use release toggles for independent deployments
Level 1 and 2 might be the 20 % that will give you 80 % of the results, which is why I put a higher priority on implementing these two levels first. After the team has gotten accustomed to the continuous delivery development process, supported by stable automated pipelines and feature environments the team should deploy to production with production in most cases multiple times a day without much trouble. The problems start arising when features depend on external factors like back-end endpoint which needs to be shipped with the feature. Normally we would have to simply wait for the external dependency to be ready for shipping with the feature, but feature toggles allow us to distinguish between deployment and release, enabling the feature to be enabled at runtime using a feature toggle when the feature is ready to be released.
Feature toggles also enable a/b testing and beta testing for users which can be very valuable also.
Implementing feature toggles with an Angular app
Implementing feature toggling easily in Angular with a features.json file which can be swapped on the deployment pipeline.
You might later want a more sophisticated GUI feature toggle dashboard that the project manager can use to control which features should be visible.
Feature toggles work process
On the start of implementing a new feature, a feature toggle should be created. The implementation of the new feature should support both the previous behavior as well as the new, controlled by a conditional statement whether a feature is enabled or not. That way a feature can be deployed with the previous behavior enabling the new feature to be toggled dynamically in production, releasing the feature.
There are two kinds of feature flags: release toggles (temporary) and maintenance feature toggle (permanent). A good naming convention for the release toggles is to prefix them with temp-*flagname*.
Cleaning up feature toggles
Before submitting the pull request for the new feature, there should be created a branch named cleanup/*flagname*, removing the feature toggle from the code, leaving just the code for the new feature. This gives the team an overview of the cleanup jobs to be done after the feature flag is no longer needed.
Use blue-green deployment pattern
Another way to split up deployment and release is to have two versions of the production environment, one passive and one active, controlled by a load balancer. This enables you to deploy to the passive environment, test the passive environment and then switch the load balancer to route traffic to the previously passive environment then everything is ready. This can even be mixed with feature toggles so you both have toggling at the upper production environment level and at the specific feature level.
The blue-green deployment pattern requires two copies of the production environment, which might each contain multiple replicas of servers. For ease of use I recommend sharing the database, if possible, among the passive and active environment and only do append-only updates to the database schema. When this becomes a problem with too many columns, you can clean up the database schema and update both the passive and active environment simultaneously.
The pros of this pattern over feature toggles are that it is easier when it is set up and doesn’t require code to work. Also, it can provide release toggling on a higher level than what would be possible with feature toggles.
The cons are that this doubles cost of production servers and hence gives a lot more servers to maintain with operational tasks and pipelines.
Level 4: Get notified about production problems before your users tell you: Implement monitoring, logging, and notifications
After having level 1 to 3 implemented you should already be deploying to production multiple times daily, as you have built the foundation for CD as well as a supporting development process.
To this point, you have probably been checking the prod environment manually. Doing this becomes tedious so we want to automate this as well. We want to set up some system that can automatically monitor our system and should notify us if there are problems.
A good prioritization would be to start with implementing logging and then monitoring.
A common logging setup is the ELK log consisting of Logstash for log processing, ElasticSearch for log storage and Kibana as a GUI for querying logs. In these GDPR times, I recommend being very aware if you store personal data in the log files. If you, for example, need a person id as an argument for an API endpoint, I would recommend having two ElasticSearch indices: an ElasticSearch index for confidential data with only the DevOps workers having access and then an anonymous log having Logstash removing sensitive data. This way you can get rid of sensitive data without losing all logs.
A good tool for monitoring on .NET servers are AppDynamics, which automatically will hook into the IIS and start monitoring the servers being hosted on it. It also supports JS monitoring. Another common option is to use Grafana and use ElasticSearch as a data source.
When using these tools you would want to specify some threshold for when shit has hit the fan and then you want to setup notification with either a Slack/Hipchat channel for prod incidents, email or getting called up on the phone. The point is, you should get notified about production problems before your customs are calling in a telling you! Or even worse; leaves you forever without a word.
When stuff has gone wrong you want the clear and precise notice to tell you asap what the problem is so you can fix the production problem immediately, without guessing what is wrong. My recommendation is to the first setup automatic notifications with for logs and then for monitoring.
For more information on how to set up logging in an Angular app with ELK, read my post here.
Level 5: Automatic and easy fallback by integrating release toggles and monitoring
Having come all this way you can deploy many times daily with ease and you get notifications when something is wrong in production, even before your customers are noticing. When some this has gone wrong you have until now manually rolled back or coded a fall forward release to fix production problems. Also, this could be automated by integrating the automatic error detection in production with the deployment pipeline.
After a feature has been rolled out, the logging and monitoring system could trigger an error notification, within x minutes from a deployment, notifying the deployment pipeline to rollback to the previous release, using feature toggles and/or blue-green deployment pattern. With tools like ELK logging and AppDynamics, you can set up thresholds for an allowed amount of errors in the newly deployed environment within a given timeframe before switching back to the previous environment. This helps mitigate the risk of breaking the production environment on deployments.
The post went through how my client went from monthly bulky deployments to deploying continuously every day by applying the five levels continuous delivery: Level 1: git flow development process with a pull request to the production branch with a tech and a QA reviewer. Level 2: automatization and streamlining of build and test with build and deployment servers and automatically deploy features in a pull request to a feature environment for QA and the automatic end to end tests to test the feature. Level 3: Use feature toggles to separate deployment from release, so you can deploy regardless of external dependencies and toggle the feature on when it is ready for release. Level 4: Get automatic notification in case of production incidents. Level 5: Automatic rollback of new deployments that have caused errors.
For some teams coming to level 2 will be sufficient, but as the application users increases, it might be a good idea to level up.