I've spent the last couple of weeks working on tidying up Bodhi's Continuous Integration story. This essentially comes down to two pieces: writing a new test running script for the CI system to use (and humans too in the development environment!), and switching from Jenkins Job Builder to Jenkies Pipeline.
Like so many scripts, I started making Bodhi's test running script in bash before realizing that it was growing too many tentacles and was becoming difficult to extend. I have plans to add an integration test suite to Bodhi that tests it against other dependant network services (such as Koji), and the prospect of getting my bash script to handle that as well with sane input/output options was daunting. Thus, I created bodhi-ci. By using click it was much easier to give it a nice set of subcommands and CLI flags that made it much easier to extend.
The loss of GNU parallel was a little sad to me, but the
features from it that I was using are mostly implemented in Python now. The main thing I'm still
missing that I had with
run_tests.sh is a fully working
-x flag, which causes all tests
jobs to exit immediately if any one of them fails. I plan to fix this by using Python's
async/await API in the future so that I can react to failures in a similar manner, but I'm
quite satisfied with the script otherwise. The old
run_tests.sh script will remain in the
repository until I refactor the new script to fully support the failfast flag.
Besides being tremendously easier to extend and modify, the new script has a number of features that would have been difficult to add to the old script:
- It prints out a summary of all the test jobs at the end.
- It buffers failed jobs' console output and prints those after the output from the successful jobs, which means the useful error messages will be near the end of the job output instead of buried in the middle.
- It runs the py.test-2 and py.test-3 jobs in parallel instead of in series. Users can also select just one of those if desired.
- It allows the caller to select between podman and docker (the default) as the container runtime. podman doesn't quite work fully yet, but this should help test it going forward.
- SIGINT is handled and all jobs are stopped before exiting.
- Full --help text is provided on all commands and flags.
- Users can opt to skip the build portion of the job.
- All jobs can be run in parallel. The old script could run release tests in parallel, but individual tests on a given release were run serially. Now all test are run in parallel.
The next step for
bodhi-ci is to add the beginnings of an integration test suite. I will
likely start by testing that the
bodhi CLI works with the server, and then will move on to
the much more difficult items (like Koji).
Thanks to Jeremy Cline, Patrick Uiterwijk, and Sean Myers for consulting with me in the high level design and requirements gathering portions of this project. Beer may have been involved in some of the consulting.
Bodhi recently had started using Mergify, which is a fantastic GitHub bot that will watch pull requests. When it sees a pull request get approved for merge, it will rebase it, wait for CI to be successful, and merge. This stops a class of problems where pull requests can pass individually but fail when merged together.
Bodhi's CI system tests against all Fedora releases, including Rawhide. It is not uncommon for the CI jobs to fail on Rawhide due to issues in Rawhide itself. When this happens, I would like to know about it in case it is a problem in the pull request itself, but also want to be able to have Mergify merge pull requests when I know the Rawhide build failed for reasons outside of the pull request itself.
To accomplish this, I refactored Bodhi's JJB job to create a job for each type of test (doc tests, unit tests, style checks, etc.) for each release. This allowed me to configure Mergify to block merging on most tests but allow failures on Rawhide. This also had a nice effect of making it much easier for contributors to see nice green checks or red x's on the individual tests, so they could see at a glance which type of test passed or failed on which releases. It wasn't all roses though, as it meant that we were using far more resources from CentOS CI. Prior to these changes, we would use one Duffy node (a physical server that your CI job "checks out" and returns when it is done) per pull request. With the new split jobs, we were using about twenty nodes per test. The CI system has to wipe and reprovision these nodes between jobs, so we were suddenly putting a much bigger load on the system. Additionally, we only had capacity to run four jobs concurrently, so the tests were taking much longer to complete. Lastly, the jobs were highly inefficient as jobs for the same Fedora release were all building the same container to run their particular test (for example, the unit tests and flake8 test both built an identical container, instead of having a build job run and then use the result from the job to fan out to the tests that need it).
Enter the Jenkies Pipeline. With some help,
I was able to accomplish something much more ideal with my new
solves the resource contention problems described above as Bodhi is now back to using a single node
per pull request, and it is able to run the build job once and then fan out to run the individual
tests concurrently. In fact, I was able to
run the builds in parallel,
and have each of those jobs
kick off the individual release tests in parallel
inside those jobs for double-parallel action. This is very nice since the
typically takes about 80% longer to build than the
rpm based containers, but we don't have to
wait for it to finish to start testing the
rpm containers. This means that pull requests start
getting results for Fedora 28 tests before the
pip container is even finished building. The
pipeline can now test a pull request in about 20-30 minutes instead of several hours due to the
efficient sharing between tests and the use of a single node.
Thanks to Brian Stinson, Patrick Uiterwijk, and Sean Myers for consulting with me on writing Bodhi's
I'm quite happy with the current state of Bodhi's CI jobs, and I look forward to further improvements that are coming soon!