I’m proud to be working for an Engineering organization that feels safe. Safe for its engineers to bring their authentic selves to work and discuss...by Derek Ralston
October 3, 2018
Continuous Integration as a service (Travis CI, CircleCI, and plenty of others) has been commonplace for a while. These services are widely used to validate proposed changes. However, there are far fewer examples of using cloud-based continuous integration tools to also do continuous deployment.
This is exactly the question I was asked recently. Instead of having separate integration anddeployment tools, some of which we purchase as a service and others my team operates internally, would it be possible to use a single tool for both purposes? Can PagerDuty have full workflow automation, a unified user interface, and the simplicity of having a single partner run the entire stack?
Pondering this question requires agreeing on the basics of whatcontinuous integration and deployment actually mean,looking at the similarities and differences between them, and consideringthe issues we can run into on the boundary between the two.
All of this discussion is presented below, starting from the questions of “Whatexactly do continuous integration and deployment mean to me?” and “Where does one stop andthe other begin?”
The process of continuous integration (CI) has two goals. The firstgoal is to validate every proposed code changeby building the product and running a series of tests to verify its functionality.When every change is so validated changes can be confidently merged, or integrated, intoa shared codebase without breaking the product. This is the traditional goal of CIas described by books, Wikipedia, and so on.
The second goal optimizes for future deployment. Since the first goal has to buildthe product in order to test it, the process can use the build results to generateeverything needed to deploy the product in the future. Let’s call it an “artifact”.Preparing an artifact in advance saves time during deployment, and artifacts canremain available even if the source code that generated them no longer exists(for example, if the branch has been force-pushed on).
Both validation and packaging should be done on the shared codebase (the mainline) and onthe code in development (feature branches). Developers may want to roll out the codefrom feature branches to test environments to quickly check things out; code may breakfollowing a merge because of changes that happened in-between even if feature branchtests were successful; and production deploys usually must come from the mainline.
The goal of continuous deployment (CD) is to deploy a newly available versionof the product without requiring human toil or causing an outage.In its simplest implementation, this can be done by SSHing into a host and installinga newer version of a package. In a distributed highly-available system, a deploymentworkflow could involve gradually rolling out the new version across a fleet ofmachines while watching the error rates and, if a problem develops, returningthe entire system to its previous good state.
While continuous deployment can be defined as only into production and only from themainline, there are benefits to configuring CD to operate on non-productionenvironments and feature branches as well. Nobody likes doing extra work, andwhen a change pushed to source control is automatically deployed shortly afterin an environment of developer’s choice this lets the developer focus on the work.
Non-production use requires more flexibility. A developer may want to deploy from any point of the sourcetree, make frequent manual changes, and “freeze” a particular environmentto not pick up any new changes until her investigation is complete. In a developmentworkflow flexible delivery is the ideal, not continuous deployment.
When useful artifacts are produced and stored during CI, the delivery processspeeds up. In fact, it may be possible for a CD process to never lookat the source code repository at all, if the deployment configuration is foundelsewhere or is packaged as part of the artifact.
On the surface, both CI and CD look quite similar: either process can berepresented as a workflow (“pipeline”) of arbitrary commands executedin a shared, state-preserving environment. Specific steps should be done in sequence,some steps can branch out to be run in parallel while others can only proceedwhen some or all of the previous steps have completed. It is tempting to try to use thesame tool to represent both the processes as a single seamless workflow. There are,however, several important differences to keep in mind.
The most important difference is isolation. A CI process should not have any sideeffects on any production or development environment. CI tests and packaging shouldrun in a self-contained manner. Isolation is the reason why multiple CI jobs forthe same product, on different branchesand code states, can run concurrently without interference. Isolation is a crucialrequirement for projects where heavy development happens on a shared codebase.
A CD process, on the contrary, is pointless if it does not have the “side effect”of deploying a new product version in a given environment. Therefore,for that environment, multiple CD runs cannot run concurrently, unlike CI processes.
There is another difference in usage that arises from the isolation/deploymentdistinction. It generally does not make sense to trigger a CI process for a givenversion of the source again, since the result is likely to be identical.However, deploy processes for a given source versionare often triggered multiple times. The common scenario is different deploy targets,such as a developer’s testing environment and production. But even if the targetis the same, a deploy can be repeated if it is a rollback from a bad deploy of alater version, or when a developer steps out to lunch and does not want toleave a buggy version in a testing environment used by multiple people.
The next difference I want to talk about is time. CI should run as fast as possible,with unit test times on the order of seconds being the best practice. There are nointentional delays in a CI process since the goal is to validate the source asquickly as possible. But a well-built gradual CD process often includes intentionaldelays, for collecting data on the performance and error rate of the newly deployedversion before deciding to either roll back or increase the deployment from a smallpercentage (“canary”) of the fleet to the entirety of it. Queuing or waiting fora lock also can take considerable time, much longer than would be reasonable fora CI process.
Finally, CI and CD have different ideal responses to resource constraints. BothCI and CD systems have limited concurrency, either by operational constraints ofowned systems (number of available build agents) or by contractual limitations ofthird-party services (up to 100 concurrent containers in use). What should be theperfect response of these systems when new runs are queued but there are no resourcesavailable to launch them?
On the mainline, I would like to run CI on all commits, preferably most recentones first, with backfilling for older commits when spare resourcesare available to find out if any breakage (even a transient one) has happened and tobisect any breakage to the individual code change. Since CD typically takes quitea bit longer than CI, deploy processes on busy codebases tend to batch all changesavailable since the last deploy and process them together. If the deploy issuccessful, it does not make sense to backfill by deploying older versions.
On a development branch used by a single developer or a small team to iterate,my preferred behavior for CI would be to prefer the most recent commit to theextent of aborting a build for an older change already in progress to get rapidfeedback on the newest change (for example, if a bug is noticed and fixed bythe developer before a previous CI process completes). For a CD process, abortingin the middle could leave an environment in a broken state, so once started adeploy should follow through.
The migration pattern I have seen is for a CI system to be adopted to performsome CD tasks and to add CD-related features, so this section assumes that a systemis feature complete for CI and wants to add robust support for delivery and continuousdeployment by addressing all of the points raised above.
Serialization of continuous deployment requires the system to have a queuingcapability once the workflow transitions from CI to CD. This can be implementedvia an explicit queue or via locking and an implicit queue for the lock. Thereshould be a separate queue per deployment target, not just per service/repository.
The use case of re-triggering a previously performed delivery processrequires both a user interface andan API for launching the deploy part of the workflow with the necessary arguments,the most important of which is the deployment target. In practice, other configurableinputs to the deploy process may need to be provided, so flexible options forextra arguments would be ideal. This use case also does not require running thecontinuous integration parts again, and can go straight to delivery.
The time consumed by continuous delivery processes requires sensible time limits(it is reasonable for a CI process to expect to finish within an hour, but it mightnot be reasonable to expect a deploy process to always terminate within an hour fromworkflow start–especially if queuing is involved). Having to queue without doinguseful work could also impact resource management: for CI systems that allocateresources to a particular job when the job begins, does it still make sense tohold on to these resources while waiting in a queue or is it possible to releasethose resources and reacquire them once the job resumes?
The difference in behavior under frequent changes (batching of CD, possiblecancelling of jobs at CI stage but not CD stage) requires the system to recognizewhen a boundary between CI and CD parts of a workflow is crossed, and to havegood user experience around the configured behavior (for example, it should beeasy to see that a given CI run did not have a CD run because it was batched withother changes, and it should be easy to jump from the CI run to the correspondingCD run).
For infrequently touched source code with simple deployment processes and a singledeployment target, the issues listed above are unlikely to occur in practice.Many currently available CI servicesmay already be used to perform both integration and deployment tasks for suchrepositories, streamlining the workflow from source to running code.
Currently, a single-step deployment process within a CI pipeline is a recurringpattern. From a CI perspective, a single invocation of a deployment toolor a single API call are made. While in the latter case the integrationbetween CI and CD is nominal at best, the former case does provide benefits(there’s no need to run a separate CD system watching commits or waiting for atrigger to start the deploy, which only purpose is to run the same deployment tool).I found multiple examples of this workflow, yet I doubt that it scales up andhandles all of the potential issues identified above, even if the deploymenttool already provides a locking functionality.
There are several workarounds and tools that help fit a more complex workflowinto a typical CI pipeline, but none of the CI systems we currently use have acomplete answer to all of the open questions. For complex deployment requirementsand multiple environments, separate CI and CD systems could be the optimal choicefor now. Both CI and CD vendors could provide easy ways to integrate betweentheir solution and most popular complementary systems–this would stave offpotential competitors that will offer an all-in-one CI and CD solution.
For vendors who want to be an all-in-one solution: it is not easy. There are sharedbuilding blocks, but integration and deployment requirements and environments havea lot of differences as well. Forcing a tool designed for one use caseto fit the other will cause conceptual strain, misfit, and poor user experience.
The ideal solution would appear to have an architecture not unlike that of Microsoft Office: two tools serving quite different purposes, like Word andExcel, that nevertheless are designed to integrate together seamlessly and leverageshared components in their implementation for efficiency. I can’t wait to give atool like this a go, once it shows up!