Moving to a Monorepo

In this post I will discuss a recent structural change that I made on a project - switching to hosting all of our front-end code in a single repository - a monorepo.

📖 Click here for a fully-worked example repository

I've been working on a suite of web applications, each with a consistent visual language, shared components and utility functions. This was originally developed as a single application with a run-time check determining which content the application would present. Out of necessity we quickly split the application into three standalone artifacts, which resulted in cloning the repository and deleting the unnecessary code. Over time we found ourselves struggling with the ability to share code between the repositories, without creating more repositories and increasing the amount of administration and co-ordination required to publish a single change. It led to many more copy-and-past changes, with the long-term risk of our "shared" libraries differing over time and become tough to re-integrate. Something had to give, so I researched ways to improve our process.

Monorepos as a Productivity Tool

The NPM ecosystem favours small dependencies as a means of promoting code sharing and reusability. Investigate the node_modules folder of any node project and you'll see the abundance of small libraries, composed together to make larger pieces of work. But when it comes to writing applications at the user-level, we sometimes forget this compositional nature, instead hosting all of that code in one large package.

There's a strong benefit to breaking applications up into libraries - even if they end up being bundled together and deployed as one minified JavaScript file. Libraries introduce bounded contexts - logical abstractions that assist us from an organizational and maintenance perspective. In my particular case, the library structure also affords the ability to re-use code between applications.

When your libraries live in separate repositories, there's a further administrative overhead to managing those dependencies. There are multiple pull requests necessary and it becomes more involved for developers to test the integration works correctly. Especially where libraries are shared between multiple teams, implementing changes costs a notable amount of time. In larger corporate environments I've seen cases where code that really should live in a library ends up living in a team's application repository instead, as it's just quicker that way.

The reason this situation comes about is the one-to-one relationship between a repository producing a publishable NPM artifact. This doesn't actually have to be the case, it's just convention. Babel, Jest, Angular and React (amongst others) have all taken the approach of storing all their code in a single repository, breaking the one-to-one relationship between a single library being housed in a single repository. It's immediately obvious that in this situation, a lot of the organizational headaches discussed above simply disappear, improving developer productivity significantly.

Using Lerna

Lerna is a tool used to create monorepos for npm packages. In additional to describing how multiple packages should be stored, it provides a mechanism to publish artifacts to NPM and a means of 'linking' packages together where local dependencies exist. Furthermore it provides an efficient means of sharing third-party dependencies.

Setting up a new Lerna repository is as simple as installing lerna globally and then running the initialisation script: lerna init This will create a lerna.json file at the root of your repository.

What Does It Look Like?

In addition to a root-level package.json, each package will also have its own package.json descriptor. Packages are therefore free to define dependencies and scripts just like they would in a standalone repository.

The root-level lerna.json file describes where packages live within the repository. By default they're expected to live in a directory called packages.

For web applications, consider the difference between applications and libraries. At a high-level, applications are the entities that are built using a tool such as Webpack, Browserify or Parcel and deployed to a production environment. Libraries are smaller utilities that are composed into applications. As such a bundler that runns from an application-level package will include the code in a library-level package. Therefore libraries themselves don't need to be built themselves using a bundler. I typically export my library-level packages as ES6 modules and let the bundler tool take care of it.

If you don't wish to have your packages published to npm as standalone artifacts, set private: true in that package's package.json descriptor.

Running Scripts

One of the features that I really enjoy about npm is the ability to define scripts in the package.json file and run them using npm run <script-name>. This manages the complexity of common scripts that are regularly run and simplifies CI on build servers. Lerna understands this common use case and supports it. By running lerna run <script-name>, Lerna will run the script <script-name> in each package that contains a script of that name.

At any point you can always cd into a package directory and run the npm scripts defined there. But sometimes it's more convenient to have these scripts aliased in the root directory of your monorepo. That way there's no need to jump in and out of packages. When running a script via the npm run command, it's possible to supply a --prefix argument, which instructs to run the command from a given directory.

In my situation, I have a monorepo which hosts a series of applications. Sometimes I need to run all three in parallel. Each of these are bundled with Webpack, so for some packages, I could define the following scripts in my root-level package.json file:

"scripts": {
  "dev:package-a": "npm run dev --prefix packages/package-a",
  "dev:package-b": "npm run dev --prefix packages/package-b",
  "dev:package-c": "npm run dev --prefix packages/package-c",
  "prod:package-a": "npm run prod --prefix packages/package-a",
  "prod:package-b": "npm run prod --prefix packages/package-b",
  "prod:package-c": "npm run prod --prefix packages/package-c",
  ...
}

This small change has a significant impact when it comes to developer familiarisation with your monorepo structure!

Importing From Existing Git Repositories

Lerna provides a script to import existing git repositories into its repository structure while preserving git history. This is vital for developer productivity, otherwise useful contextual information disappears into a black hole.

Unfortunately, I struggled to import my existing git repositories using lerna import. I suspect that this was down to merge conflicts that required manual resolution. There is a documented method for fixing this, but this involves re-writing history in a way I was not entirely comfortable with. Instead, I utilised the magic that is the git subtree command to manually import into the new destination:

Add the old repository as a remote: git remote add <name> <path>.
Create a subtree to the remote. Use the prefix option to specify where you would like the subtree to be written to. By default that will be a directory in packages: git subtree add -P packages/<package-name> <remote> <branch>.

If you ever have to merge from the old repository, you can handle that via a subtree merge: git subtree merge -P packages/<package-name> <remote> <branch>. Note that whilst this technique preserves git history, I lost tag information. I didn't mind so much because I can always grep for exact commits where a tag was created.

Managing Dependencies

This is where Lerna really comes into its own. Each package is free to declare its own dependencies, but for cases where several packages wish to depend on the same dependency, Lerna's bootstrap process will "hoist" the common dependencies so that they're stored in one location. For painless hoisting, add lerna bootstrap --hoist as a postinstall command in your top-level package.json. This means every time that npm install is run, the hoisting stage will happen immediately after, ensuring that all packages that depend on one another are correctly referenced.

Example 1

There are two packages that both depend on the same dependency, i.e React v16.3. When you bootstrap, React will be installed at the root-level node_modules folder only. Node's module resolution will find these as expected.

node_modules
 \ react
packages
 \ package-a
 \ package-b

Example 2

Similar to above, two packages depend on the same dependency, but require it to be available from the command-line. An example is Webpack, as you may wish to run Webpack from the package's folder. When you bootstrap, Webpack will be installed at the root-level node_modules, although a "symlink" exists in the .bin folder of each package's node_modules

node_modules
 \ webpack
packages
 \ package-a
  \ node_modules
   \ .bin
    \ webpack.cmd
 \ package-b
  \ node_modules
   \ .bin
    \ webpack.cmd

Example 3

The Lerna repository contains two packages, where one depends on the other (package-b depends on package-a). When you bootstrap, a "symlink" to package-a exists in the node_modules folder of package-b.

packages
 \ package-a
 \ package-b
  \ node_modules
   \ package-a

Note that in this case Lerna handles the versioning of the dependency to a local package for you, so as you publish new versions, it will update package.json files of your packages automatically.

The modern web development toolchain is quite advanced - including transpilation, bundling, linting and minification. Very often your packages will wish to perform these steps in a consistent manner. To achieve that, place your configuration files, such as your .babelrc and eslintrc files, at the root directory of the repository. This way they will be found through Node's dependency resolution algorithm and applied consistently.

Testing Packages

I use Jest as a test runner and assertions library so I'll be referring exclusively to its operation here. In the interests of balance, other runners and frameworks are available!

Jest supports monorepos out of the box which is super-handy, but opinions differ regarding whether you should run your tests as one large run, or parallelize running Jest tests from each package. I've used both.

As One Runner

For the first options, as with other shared config files, declare your jest config file in the root directory of your rpeository.

Here's an example of a bare-bones top-level Jest config file:

module.exports = {
  collectCoverage: true,
  collectCoverageFrom: [
    'packages/*/src/**/*.{js}',
    '!**/node_modules/**'
  ],
  moduleNameMapper: {
    ".+\\.(css)$": "identity-obj-proxy",
    "common": "<rootDir>/packages/common/src"
  },
  roots: [
    'packages/'
  ]
};

This file is providing a mock (specifically identity-obj-proxy) so that CSS files aren't included. It's instructing Jest to look for tests and source files in the packages directory. Code coverage is configured to include this directory too, and exclude node_modules for obvious reasons. The only interesting part is the declaration of common in moduleNameMapper - in my sample repository this is a library package that lives within the monorepo and is used by other packages. In order for Jest to correctly find this, I found that I had to specifically add this module name transform.

From there, it's just a case of adding "test": "jest" in the root-level package.json scripts section.

As Multiple Runners

Should you wish to have Jest run standalone in each package, that's perfectly fine too! As you would if the repository were standalone for your package, define a jest.config.js in that package's directory. Similar to above, that package should have "test": "jest" in its package.json scripts section.

On your CI server you'll most likely wish to run lerna run test, which will execute the test script in each package that defines one. You can do that while developing locally too if you wish, but you may also find yourself just wanting to run the tests from a single package. It may be beneficial to therefore define a top-level package.json script to only run the tests in a given package, using the --prefix trick mentioned above. For a package called package-a, that could look like this:

"test:package-a": "npm test --prefix packages/package-a"

Note however that if you wish to pass arguments, such as to instruct Jest to update snapshot files, you have to use two instances of -- in the arguments:

npm run test:package-a -- -- -u

Publishing Packages

Lerna has two versioning modes: fixed and independent. These versioning modes determine what occurs when you wish to publish npm packages from the Lerna repository.

Fixed mode is the default and it's my preferred option - in this mode the lerna.json file at the root of the repository tracks the version number for each package in the application. When running a publish build, for all packages that have been marked as updated (via the lerna updated command) then a new version is published with the same version number.

Given a Lerna repository with three packages - if the version is currently 0.1.0 and two of the three packages have been updated, Lerna will publish the two updated packages using the version number that you specify, let's say 0.2.0.

Independent versioning is where the exact version number of each package can differ. Whilst this may not be immediately obvious, consider the case where a change to a library package constitutes an incremental change to one package that depends on it, but a breaking change in another. If you're an adopter of semantic versioning then you would wish for your new version numbers to reflect this change. Managing independent versioning requires more co-ordination, so I would only recommend it if absolutely necessary. By the way, I've written more on Lerna Independent Versioning.

Lerna's publish command will create a new release, utilising several lerna sub-commands. It will identify any packages that have changed, bump their version and then push those packages to NPM. Finally a commit is applied to the repository and a new tag created. The version number can be supplied to the command - either exactly or using a sem-ver keyword like patch, minor etc. Alternatively do not supply a version for an interactive prompt. Remember to add --yes in CI environments to skip interactive prompts! A typical CI setup would typically look like:

npm install (with the above postinstall step to run lerna bootstrap --hoist)
lerna run build
npm test (or lerna run test)
lerna publish patch --yes

For further configurability you may wish to extend step four into a custom script that reads some configuration from a file, to determine the versioning scheme (cd-version or repo-version) to use, and the subsequent values of those options. This could be useful if you wish to publish a prerelease version or an exact version number from a branch, but your day-to-day incremental builds run on a patch basis. Furthermore this can be extended to a CI/CD pipeline scenario, where following your publish step you auto-deploy to environments.

Conclusion

I've enjoyed the flexibility of storing multiple packages in one repository. Especially in corporate environments where there's a level of administration required to set up repositories, Lerna's structure affords me the ability to create packages arbitrarily. This is great for code re-use, modularisation and provides an efficient way of building code via dependency re-use and parallel execution of npm scripts.

Taking this one step further, Yarn Workspaces takes the concept of Lerna but tightly integrates it into the package manager CLI. This makes installing dependencies even more efficient. Lerna is fully compatible with Yarn Workspaces, so it's not necessarily a choice of one or the other.