You're looking for the reflink feature, which was introduced in 2009. It only works with certain filesystems – currently Btrfs, XFS, and the upcoming Bcachefs. (ZFS is still working on it.)
Use --reflink
to create a CoW copy when possible (this is already the default as of coreutils 9.0), or --reflink=always
if you want to make sure it'll never fall back to doing a full copy:
cp --reflink OLDFILE NEWFILE
The new file will have a different inode, but will initially share all data extents with the original (which can be compared using filefrag -v FILE
or xfs_io -rc "fiemap -v" FILE
).
An alternative is filesystem deduplication, which is supported by Btrfs and ZFS among others, and allows merging identical blocks underneath existing files. In ZFS this happens synchronously ("online" or as soon as the file is written), while in Btrfs it's done as a batch job (i.e. "offline", using tools such has Bees or 'duperemove'). Unfortunately, online deduplication in ZFS has a significant impact on resource usage. If you use Btrfs, however, you can just run duperemove -rd
against the folders once in a while.
Finally, whether you use reflinks or dedupe, you'll also want to use backup tools that themselves perform deduplication (it is not enough to use a hardlink-aware backup tool, as reflinks don't look like hardlinks). For example, the archive formats used by Restic and Borg are content-addressed (much like Git), so identical blocks will automatically be stored only once per repository, even if they occur in separate files.
The OCFS2 cluster filesystem on Linux also has "reflinks" at least in name, but doesn't support the standard reflink creation API, so they have to be created using an OCFS2-specific tool.
On Windows, ReFS supports reflinks under the name "block cloning" (though it doesn't seem to come with a built-in CLI tool); NTFS does not. Finally on macOS, cp -c
will create reflinks (CoW copies) as long as you're using APFS.