The Art of “rsync”

As a migratory systems engineer, I have lived, or stayed extensively, in cities all over my country, The United States of America. Due to this, I belong to many mailing lists and technical groups in CONUS (CONtinental United States.) One of the groups I belong to is the the DCLUG, or more extensively stated, the Washington, DC Linux Users Group. A recent dialogue of correspondence covered a very mundane topic; the topic of “rsync,” and it’s behavior while trying to do incremental copies. A member of the group, a Mr. Michael Henry, replied with a very in-depth answer and I felt it should be recorded for posterity’s sake, as even I, being a Unix/Linux user for over 20 years, learned some rsync nuance from this walk-through. You will find the contents of his reply copied here.

Alan (Original Post) wrote:

Assembled Wisdom!

I need rsync for something very primitive: to copy incremental additions and
subtractions from directories on my home hard drive to a thumb drive.
I have just installed rsync and have a couple of tutorials installed. But
they seem to be much more complex than I need. Of course I could
experiment, but I’m lazy and hope that someone can tell me how he/she
does it. For I’m sure that y’all use rsync, right?

TIA for anticipated help/guidance!

Alan

Peter wrote:

I’m not sure if you posed the question well enough, or my
interpretation sucks. Is the goal that you want to mirror
things on your thumb drive? In that case the

rscync -av –delete SRC DEST

is what i use. man page will explain what they do.

Alan wrote:

Yes. This seems short and sweet. I want to be sure
that nothing in the ‘DEST’ — the thumb drive — is ever
deleted. Except of course: suppose I
fix/alter/shorten/improve some script that I have
written, do I want my back-up to contain only the new
version, or should I keep the old version as a
‘historical record’? Of course this is a personal
decision that I must work out for myself

Michael Henry wrote:

Peter’s invocation includes the “–delete“ flag which will
delete any files in the destination that aren’t present in the
source. Since you’d like to never delete things, you wouldn’t
want to use “–delete“.

You might like to use the “–dry-run“ flag in addition to the
verbose (“-v“) flag to see what “rsync“ intends to do. I
find that seeing a dry run can clarify things that aren’t always
crystal clear in the manual.

Also, “rsync“ places heavy significance on directories with
trailing slashes in “SRC“. With a trailing slash, “rsync“
copies only the contents of the directory to the destination;
without the slash, the directory name itself is copied as well,
adding a possibly unwanted extra directory layer in the
destination. Consider some test cases, which can be pasted
directly from this email into a Bash prompt::

Create a playground for testing rsync:

mkdir -p ~/tmp/rsync
cd ~/tmp/rsync

Create some sample source and destination directories:

mkdir -p src/{common,src-only}
mkdir -p dest/{common,dest-only}

Create some sample files:

touch dest/common/{file1.txt,dest-only1.txt}
touch dest/dest-only/unique-to-dest.txt
touch src/common/{file1.txt,src-only1.txt}
touch src/src-only/unique-to-src.txt
touch src/top-level-file.txt

Make sure everything has the same time stamp:

find -type f -exec touch -r src/top-level-file.txt {} +

Examine a sorted list of the sample files:

find -type f | sort

The following files are found::

./dest/common/dest-only1.txt
./dest/common/file1.txt
./dest/dest-only/unique-to-dest.txt
./src/common/file1.txt
./src/common/src-only1.txt
./src/src-only/unique-to-src.txt
./src/top-level-file.txt

Now consider copying a source without a trailing slash::

rsync -av src dest/

Notice that “rsync“ copies the entire source tree into the
destination as shown by the verbose output::

sending incremental file list
src/
src/top-level-file.txt
src/common/
src/common/file1.txt
src/common/src-only1.txt
src/src-only/
src/src-only/unique-to-src.txt

sent 417 bytes received 112 bytes 1,058.00 bytes/sec
total size is 0 speedup is 0.00

Now examine the files in the tree::

find -type f | sort

In the below output, there is a new “src“ directory below
“dest“ due to the lack of a trailing slash on “src“. Note
also that the file “./src/top-level-file.txt“ was not copied
to the corresponding location “./dest/top-level-file.txt“::

./dest/common/dest-only1.txt
./dest/common/file1.txt
./dest/dest-only/unique-to-dest.txt
./dest/src/common/file1.txt
./dest/src/common/src-only1.txt
./dest/src/src-only/unique-to-src.txt
./dest/src/top-level-file.txt
./src/common/file1.txt
./src/common/src-only1.txt
./src/src-only/unique-to-src.txt
./src/top-level-file.txt

Repeat this but with a trailing slash on “src/“::

rsync -av src/ dest/

As this output shows, the contents of “src/“ which are
different from the files in “dest/“ are copied::

sending incremental file list
./
top-level-file.txt
common/src-only1.txt
src-only/
src-only/unique-to-src.txt

sent 364 bytes received 89 bytes 906.00 bytes/sec
total size is 0 speedup is 0.00

“New” files like “top-level-file.txt“ are copied, but not
files with the same size and timestamp (such as “file1.txt“).
The invocation with the trailing slash on “src/“ means to copy
everything below “src“ into “dest“, which is most often what
I’m trying to do.

With no changes to the file trees, a second invocation of
“rsync -a“ does nothing::

rsync -av src/ dest/

The output shows a total size of zero bytes of changes::

sending incremental file list

sent 229 bytes received 14 bytes 486.00 bytes/sec
total size is 0 speedup is 0.00

If we now modify a single file and try again, just that file’s
modifications are transferred::

echo ‘changes’ >> src/file1.txt
rsync -av src/ dest/

Note that “file1.txt“ was transferred along with “./“
(because its modification time changed):

sending incremental file list
./
file1.txt

sent 319 bytes received 40 bytes 718.00 bytes/sec
total size is 8 speedup is 0.02

Finally, add in the “–delete“ flag along with “–dry-run“
to see the proposed effects without actually changing anything::

rsync -av –delete –dry-run src/ dest/

Note that “rsync“ would delete the extra “src“ tree that was
created due to the lack of a trailing slash in the earlier
steps, as well as the “dest-only/“ directory and the file
“common/dest-only1.txt“ which don’t exist in “src“::

sending incremental file list
deleting src/src-only/unique-to-src.txt
deleting src/src-only/
deleting src/common/src-only1.txt
deleting src/common/file1.txt
deleting src/common/
deleting src/top-level-file.txt
deleting src/
deleting dest-only/unique-to-dest.txt
deleting dest-only/
deleting common/dest-only1.txt

sent 261 bytes received 237 bytes 996.00 bytes/sec
total size is 8 speedup is 0.02 (DRY RUN)

I find that “rsync“ behaves most intuitively for me when I use
the trailing slash on source directories. To simplify things, I
generally use a slash on both sources and destinations, which
makes the rule easier to remember and apply in practice. This
works because the presence or absence of a trailing slash on a
directory doesn’t matter to “rsync“.

Also, if you decide you do want to keep historical records of
old versions of your files, I highly recommend “rsnapshot“:
http://rsnapshot.org/

The “rsnapshot“ script provides a way of taking
space-efficient “snapshots” of your file tree. Files that
haven’t changed since the previous snapshot do not take up much
extra space because of the way “rsnapshot“ uses hard links.
The original article by Mike Rubel linked by the above web site
was the inspiration for “rsnapshot“ and as I recall, it was an
enlightening look at the underlying mechanism. The site appears
to be down now, but the Wayback Machine has it:
https://web-beta.archive.org/web/20170104080412/http://www.mikerubel.org/computers/rsync_snapshots/

If you want to hand-roll something custom, you may find Mike’s
article enlightening.

Another approach is to use a version control system like Git,
Mercurial, Subversion, etc., to track your changes. This works
well, especially for text files. You can track historical
changes to files in a way that is easy to examine later. Using
Git as an example, you could create a Git repository on your
main machine, then “rsync“ the entire directory to the thumb
drive as a backup. Things that are deleted will still be in the
Git history, so it’s safe to use “rsync –delete“ to backup
the tree. Depending on your needs, this might be another
approach to pursue, though there is a larger learning curve than
using “rsnapshot“ if you haven’t used source control in the
past.

Michael Henry

Leave a Reply

Your email address will not be published. Required fields are marked *