A Dark and Tortuous Journey in DevOps
[Edit: This was posted back in October, but this post become way more relevant on March 22, 2016, when a key dependency was unpublished and the breakages discussed here actually happened to lots of people! I’m not at all saying everyone should use instaclone, but external instabilities like package registries is something every devops person should consider.]
Node development is quick and easy with npm. But managing the workflow around npm install can be a pain point in terms of speed, reliability, and reproducibility as you scale out builds and in production.
I’ve recently had some luck with a tiny tool that’s significantly improved build speed and reliability, so thought I’d share some thoughts on it. But first, let’s talk about some of the issues in managing npm packages reliably for developers, CI systems, and deployment.
As we all know, the state of the node_modules is not inherently reproducible from the package.json file, so for production and sanity in testing, you should use npm shrinkwrap to lock down exact versions of all the packages you use. (If you’re not using shrinkwrap already, you should be.) This is a good thing, and encourages you to check in and manage these changes explicitly.
However, npm shrinkwrap doesn’t guarantee byte-for-byte repeatable installations.
Think about it: Reliability requires controlling change. If you change some single piece of code somewhere, and your build system reruns npm install, what if one of your hundreds of packages was unpublished, or you have an issue connecting to npmjs.org? In addition, shrinkwrap doesn’t pin down peer dependencies. It also does native compilation using your local environment. Basically, it’s impossible to ensure exact repeatability unless you just make an exact copy.
Of course, you could go with the old idea of checking node_modules into Git, but it turns out that’s a troublesome approach too, in particular since many packages are platform-dependent.
Operationally, you also want a more scalable solution to distributing packages than hitting npmjs.org all the time. You don’t want lots of servers or build machines doing this continuously.
You can set up a local npm repository or a local npm cache server to help, but this is more infrastructure for devops to maintain and scale. And incidentally, it also is likely to be a single point of failure: Not being able to push new builds reliably is a Bad Thing — precisely when you don’t need it. Plus, if you use private modules and pay npm, Inc. to host your private code, you may not need another local repository.
Finally, downloading from the global server and even installing from the local cache, take a lot of time, e.g if you want to do rapid CI builds from a clean install. With all these solutions, npm install still takes minutes for large projects even when you haven’t changed anything. (It’s possible some details here will improve with npm 3, such as peer dependencies and performance, but the overall point remains.)
Archive and Clone
A simpler and more scalable solution to all this is to archive the entire node_modules directory, and put it somewhere reliable and scalable. There is a place like this. It’s called S3! But this would be large and slow to manage if it were always published and then fetched every time you need it. It’s also a headache to script, especially in a continuous integration environment, where you want to re-install fresh builds on all branches, every few minutes, and reinstall only when the checked-in npm-shrinkwrap.json file changes. Plus, as we just said, the builds are platform-dependent, so you need to publish separately on MacOS and Linux.
Instaclone is a tool to do all this for you. If you already have an npm shrinkwrap workflow, it’s pretty easy. It lets you specify where to store your node_modules in S3, and version that entire tree by the SHA1 hash of the npm-shrinkwrap.json file, together with the architecture. You can then work on multiple branches and swap them in and out — a bit like how nvm caches Node installations.
If you use instaclone publish after committing your npm-shrinkwrap.json file, you can switch back and forth between Git branches and run instaclone install instantly instead of npm install and waiting minutes. Your colleagues can do this too — after you publish, they can run instaclone install and get a byte-for-byte exact copy of your node_modules cached on their machines. Finally, your CI builds will speed up most of the time — possibly by a lot!
Install via pip install instaclone. Copy and edit the example config file to try it.
In fact, Instaclone is a generic tool, so you could publish somewhere besides S3, and use it for storing and caching any files or directories, not just node_modules.
Do let me know if it works for you. If it doesn’t, file issues or PRs!