Introduction
This paper explains how my desire for a Windows friendly tool to do something I thought was so easy there’d be lots of solutions out there led me to writing my own solution.
I consider two common functional requirements:
- You have several machines and want to synchronize key data between them.
- You want an efficient way of recovering your data if something goes wrong.
And I’m also interested in achieving the following:
- Preserving hard links
- Acceptable performance when one of the systems is remote
And when used for backups:
- Works for backups of the OS, not just user-level data files
- Efficient use of space on the destination when backing up multiple systems which share lots of files
- Ability to preserve old backups when creating new ones
- Ability to browse a backup using familiar tools to find individual files etc.
I was inspired by, and liberally borrowed ideas from, tools such as rsync and robocopy, after making extensive attempts to use them rather than writing my own. I also looked at quite a few other tools out there, but most of them are either designed for user-file level backups/synchronisation, or are unix tools ported to Windows and don’t fully understand Windows subtleties.
The problems with existing tools.
- Believe it or not, there is no way using the built-in tools in Windows 7 or 8 to copy a file hierarchy from one disk to another and guarantee that the result will be identical.
- Standard windows tools like robocopy with /b /copyall come close, but can’t handle junction points properly. By default they copy the contents of the destination of the junction – potentially causing an infinite loop. Newer versions of robocopy handle symlinks with the /sl option, but still get junction points wrong. The only safe solution is to use /xj at which point any junction points are completely ignored. ( This matters because Windows 7 makes significant use of junction points as distinct from symbolic links ).
- Alternative ‘unix-like’ environments like Cygwin have problems with symlinks/junction points and also don’t handle windows permissions with complete fidelity. This is one of the reasons why rsync isn’t quite good enough. Various attempts have been made to port rsync to windows natively, but none has yet succeeded.
- If your destination is not another Windows drive, but a unix NAS drive for example, existing tools will lose data, making it impossible to perform a truly faithful restore.
- Because Windows accounts are not the same across domains, or different machines in a workgroup, permissions based on specific users don’t work as expected when files are copied between machines.
- Alternate file streams are used internally by some windows programs, but many programs don’t recognise them and may not copy them when a file is copied.
- Being an administrator is not enough to ensure you can read and write other people’s files. You also need special account privileges. Robocopy can give you these with its /B switch, but many other programs don’t.
- On the positive side, the ability to create volume shadow copies and to access them directly means that you can often perform backups without needing to turn off services.
- Backing up hierarchies containing hard links is badly handled by almost everything.
- Prior to Windows 7, NtBackup was capable of producing valid backups which coped with almost everything, but although it sort of works under Windows 7, it gets some things wrong and can no longer be trusted.
The solution – greenclone
Faced with these problems, and finding myself with some free cycles, I finally decided I needed to solve the problem once and for all, and the opensource greenclone project was born.
The first version of greenclone already exists, and meets enough of the requirements to be genuinely useful – enough so that I now use it myself daily in preference to other tools. This version can:
- copy windows hierarchies faithfully, preserving permissions, internal hard links, file streams, attributes, etc.
- create and work with VSS shadow copies
- understand and support the \\\?\ syntax for NTFS very long filenames and low-level disk and VSS access
- remove extra files from the destination
- Can store permissions, AFS, etc in a separate associated file. If it exists, this file will then be used transparently during a subsequent copy to restore the original in full detail. This allows us to make faithful OS backups onto things like Samba mounted unix drives.
It has a relatively modest set of options in its commandline version, while the underlying code, written in c# and freely available, provides somewhat finer grained control.
greenclone is already I believe a valuable and useful tool in a wide variety of situations, and I encourage the reader to visit the project at http://github.com/bilkusg/greenclone
That said, it does not yet meet all my original objectives:
- If both source and destination are local, greenclone is very fast. But if one of the systems is remote, it relies on the usual Windows networking layer, and that REALLY slows things down, especially for doing things like backup over wifi. (See below for a temporary workaround )
- If the destination system is a central server being used to store backups of lots of machines, it will often turn out that files are duplicated across backups. For example, system drive backups of several windows machines will have a lot in common, and this can use up a surprising amount of space. I am currently working on a solution which will allow multiple backups of different systems to share a repository, using hardlinks and some ideas from git. This will also handle the situation in which a large file is moved unchanged from one place to another.
- If an existing file has changed, we currently copy the entire source over, even if the change is one byte out of a million. We want to incorporate ideas from tools such as rsync to improve efficiency of such transfers.
So there’s plenty of work left to do….
Workarounds
My biggest difficulty with greenclone currently is that it takes forever to backup my laptop over wifi, even if very little has changed. Fortunately, my laptop has a large local hard drive most of which I don’t need. So I created a spare partition, make the backup locally to that partition ( using the /K option to preserve metadata in a separate file ), and then use rsync to transfer the result efficiently over to my server.