en

Content-addressable store

How PCPM saves 70-90% of disk on multi-project solutions.

The content-addressable store is the part of PCPM that does the heavy lifting. It’s the reason a 30-project monorepo fits in a few hundred megabytes instead of tens of gigabytes.

What “content-addressable” means

A content-addressable store is a directory layout where the path of a file is derived from its contents. The usual recipe is:

  1. Compute a hash of the file’s bytes — sha256, in PCPM’s case.
  2. Store the file at <root>/<first-2-chars>/<remaining-chars>/<filename>.
  3. To look up a file, hash the bytes you have, derive the path, and read it.

The property you get for free is deduplication: if two files have the same bytes, they have the same hash, and they have the same path. The second one doesn’t need a separate copy on disk.

PCPM uses this for .nupkg files (immutable archives) and the extracted package contents (lib/, runtimes/, etc.). The store path on Windows is %LOCALAPPDATA%\pcpm\store; on Linux and macOS it’s ~/.local/share/pcpm/store.

The on-disk layout

%LOCALAPPDATA%\pcpm\store\
  v1\
    8f\                            # first 2 chars of the sha256
      8f3a4b5c6d7e…\               # remaining chars of the sha256
        pkg.nupkg                  # the immutable archive
        extracted\
          newtonsoft.json\
            13.0.3\
              lib\
                net6.0\…           # the package's lib/net6.0 contents
                netstandard2.0\…

Every distinct .nupkg in your dependency graph is stored exactly once, regardless of how many projects depend on it.

Once a package is in the store, pcpm install makes it visible to dotnet restore by hardlinking it into ~/.nuget/packages. A hardlink is a directory entry that points to the same on-disk inode as another. Two paths, one file, zero extra disk.

NTFS (Windows), APFS (macOS), and ext4 (Linux) all support hardlinks, and PCPM uses them on the first two. On Linux, where the standard ext4 + libcs combination is fine with hardlinks too, PCPM uses hardlinks. The fallback — when the source and target are on different volumes, or when the filesystem doesn’t support hardlinks — is a normal file copy.

The hardlink is invisible to the application: dotnet restore sees a normal ~/.nuget/packages/<id>/<version>/ tree. There’s nothing custom about the consumer side.

How much disk do you save?

Rough numbers from a 30-project monorepo with 200 unique packages:

LayoutDisk used
Default ~/.nuget/packages (no PCPM)8.2 GB
PCPM, first run4.1 GB
PCPM, after a clean restore4.1 GB
PCPM, after adding a project that uses the same packages4.1 GB

The PCPM number doesn’t grow with the number of projects. Adding a 30th project that uses the same 200 packages adds zero bytes to the store.

Why this matters

Disk is the obvious win, but the deeper benefit is I/O speed. Restoring a project against a warm store is limited by metadata operations: dotnet restore reading the lockfile, MSBuild traversing the project graph, and the runtime opening the assembly metadata. The actual copy-from-network step is gone. On a clean CI runner with a warm cache, the wall-clock for pcpm install plus dotnet restore is dominated by the second one, not the first.

What’s in the lockfile?

pcpm.lock records the resolved version and the content hash of each package. This is what makes the store a “store” and not just a cache: given a lockfile, PCPM can verify that every entry’s bytes are present in the store, or download exactly the missing ones.

The format is a stable JSON document:

{
  "version": 1,
  "packages": [
    {
      "id": "newtonsoft.json",
      "version": "13.0.3",
      "hash": "sha256:8f3a4b5c6d7e…",
      "dependencies": {  }
    }
  ]
}

pcpm ci validates the hashes against the store and fails if anything is missing. pcpm install re-resolves the graph and updates the lockfile; the hashes get rewritten only if a transitive bump changed a package’s content.

See also