in Blog

Test-driven Infrastructure with Puppet, Docker, Test Kitchen and Serverspec – Yury Tsarev, GoodData

Notes from a talk at LinuxCon/ContainerCon Europe 2016.
Yury Tsarev is  QA Architect at GoodData. They offer a business intelligence platform running on several datacenters, thousands of VMs and hundreds of physical servers. They use Puppet and OpenStack at the lowest level.
This talk is about testing configuration management code, not application deployment. The Puppet code is stored in git, it is the common ground for the whole DevOps organization. But the code has quality issues. They (unfortunately) do not employ the common roles and profiles pattern, but rather have a main entry-point designated as “type”.  It is a huge codebase: 150 puppet modules and 100 types. A type is applied on a particular node based on an environment variable. The modules are very tightly coupled and have multiple interdependencies. There are 2650 resources and 7619 interdependencies. This complexity creates reliability problems.

First Attempts

Their first approach in the old days was to perform manual smoke tests. Obviously, this did not scale.
Then they introduced a Puppet self-check step: Linting, compile Puppet catalog, apply it with –noop in a fakeroot environment, integrate everything in Jenkins and provide detailed feedback with every pull request before a merge. A minimal deployment pipeline was set-up. After the self-check merge, code is pushed to staging clusters and then to production clusters. But this workflow misses errors in configuration files and there is no smoke testing of proper service startup. Optimally issues are caught before the merge to save costs.

New Approach

Their new approach is based on the Ruby tool test kitchen. It is composed of a driver, provisioner and verifier:
  • The driver creates a testing instance (on EC2, Digital Ocean, Vagrant, Docker, LXC, etc.)
  • The provisioner applies some configuration management
  • The verifier executes tests written in serverspec, bats, shunit2, rspec etc.
The whole process is highly sequential. It is always the same: drive -> provision -> verify


As a driver they tried a VM first, but you cannot run 100 VMs for every Puppet type, it is simply too costly. So they moved to Docker using kitchen-docker which brought much improved resource utilization. Everything is running on a small three-node Jenkins cluster. Of course one of the benefits is the identical testing environment on developer laptops and Jenkins.
Caution, they use Docker to run system containers, not containerized applications. This results in additional challenges, they have to be concerned with limits and constrains not normally encountered by Docker users. It is not a real VM scenario, so it can be difficult to debug.
They still did not run privileged containers, but use custom Dockerfiles, mounted volumes and fine-grained custom capabilities (SYS_PTRACE, SYS_RESOURCE) to achieve their goals.


They use kitchen-puppet to apply the code, it copies the Puppet code into the instance under tests and provides all facts/Hiera.


The core problem are relying on external services in Puppet code. Applying the catalog will fail, we should avoid external dependencies to get deterministic results. We could use production services, but we don’t want to spoil them with test data or generate additional load.  However read-only services likenpm, rpm repositories are fine. They came up with shellmock, a simple tool that replaces binary with a simple echo command and returns exit code 0.
Then, they run actual tests using serverspec. It is quite extensible and supports a multitude of different Linux distributions. Serverspec itself is pretty barebones. They have some repositories that can help in understanding the basic setup:
On top of serverspec, they have some YAML based configuration for different environments, geographical datacenters, translation of host to type etc.
It is not straight-forward to come up with good test cases. But they use as a rule of thumb:  What would I anyway check manually after server deployment?

Finalized Deployment Platform

Test-kitchen has the concept of platforms (distro/image) and suites (semantics of test run). First, they ran all the tests for every changes, but quickly this became too time-consuming. So, using puppet-catalog-diff, they only test what actually changes.
The jobs are configured using Jenkins Job Builder (simple descriptions translated into Jenkins job, coming from the OpenStack infrastructure community). They are also looking into using Zuul.
They use a multitude of different tools in scenarios not originally devised by the authors. They can recommend this pattern very much. Also, test-kitchen is quite technology agnostic, the driver can be exchanged, provisioning technology is not fixed to Puppet.
A memorable quote was about managing different upstream projects that are maintained by companies, but sometimes by individuals: “practically it’s a bit chassle”
When they could start over again with their Puppet code-base, they would try to have:
  • more modular puppet, less interdependencies
  • more unit test first (rspec-puppet)
  • smaller number of types
A question from the audience mentioned Beaker from PuppetLabs as an alternative.