Building a Remote Caching System: The Sequel
Last fall, Docker made some big changes that required us to overhaul how our Codeship Pro image caching system worked. Our director of engineering, Laura Frank, published a blog post explaining everything back when we launched this new system.
The gist of this was that Docker no longer allowed images pulled from a remote source to be used as a cache source. This was a security measure to prevent cache poisoning. The workaround to this was to rely on the save and load commands to package up your images and store them on S3 as tarballs. This took a lot more time to do, per image, for a variety of reasons; Laura’s blog post will explain it all in more detail.
Now, though, we’re undoing all of that…because Docker restored the original functionality and made it possible for us to use remote images as a local cache source again.
For you, this means much faster image caching on your builds. For us, it means we’re using an Amazon-backed registry with much less disk space and much less overhead to offer a much better experience and faster build times for Codeship Pro builds! If you can’t tell, we’re very excited about improving the performance of our caching system.
Watch the engineer who’s working on our caching system discuss the update.
Registry-based System Security and Benchmarks
Let’s take a minute to talk about the how and why of these changes a bit more.
What is the new system in more detail?
Essentially, after your builds complete, we push any images you have enabled caching for — using your simple cached: true
directive attached to your services, as defined in your codeship-services.yml file — to an Amazon-backed registry we maintain.
These images are set up on registry accounts with credentials unique to your project. No other project has backend access to your cached images, and there is no centrally accessible pool of cached images under a single account.
We’re big fans of deferring security to more complex and larger providers, like AWS, where we can. Rather than host our own registry infrastructure and add a new security apparatus to our infrastructure support operations, we made the call that using AWS for this purpose would provide more reliable security than we could internally compete with at our size.
Let’s look at performance
Speed was the main reason we made this change, and the benchmarks we have indicate that the gains for most builds will be quite substantial. Additionally, because we no longer need to save parent image data and because registries are much smarter at only saving differentials between images than the full tarball we previously had to rely on, you can see it has a huge dent in the disk space the system requires as well.
- React + Postgres: 40% Faster (6 minutes saved)
- Rails + Postgres: 22% Faster (4 minutes saved)
- Node + Postgres: 56% Faster (12 minutes saved)
- Node + Selenium: 30% Faster (5 minutes saved)
Optimizing for Caching
Before wrapping up, let’s also discuss the internals of how image-based caching works so that you can design your Docker projects to get the most out of a caching system.
Caching is layer-by-layer, just like your Docker images. Every specific command in your Dockerfile generates a new Docker layer. So combining two commands into a single command reduces the layer count by one. Inversely, breaking a single command out into multiple commands adds additional layers.
This is important to keep in mind, because when your cached services are built during your Codeship Pro build run, cached layers can be reused only up to the point of a breaking change. Once we hit a layer that’s different from your last build — let’s say your code has changed — Docker will rebuild the rest of the image from that point on. This kind of architecture has some key design considerations for you as a result:
- Move breaking changes farther down in your Dockerfile
- Combining statements can reduce image complexity, if they’re not brittle layers
- Adding your code should be one of the last things you do in your images
- Be mindful of dependencies and how dependency changes may invalidate the rest of the cached image
If you’re looking for more information on optimizing your builds to make the best use of caching, you can read our blog post on the topic for more suggestions and examples.
Reference: | Building a Remote Caching System: The Sequel from our WCG partner Ethan Jones at the Codeship Blog blog. |