http://www.ict.griffith.edu.au/anthony/info/apps/rsync.hints -------------------------------------------------------------------------------- rsync server usage... Get a listing from a rsync server rsync -avz rsync://samba.anu.edu.au/rsyncftp Rsync only updates from a source directory to a destinations for which either can be local or remote. IE one direction only though timestamps can prevent newer files from being overwritten. A new application "unison" is a by-directional syncronization tool. But relies on a cache of the directory information. The cache can take a bit of time to set up, but once done unison works vary well. As such it understands file deletions better than rsync, and knows when the same file on both areas were updated creating a conflict. -------------------------------------------------------------------------------- Remote to Local transfers You can specify multiple remote to local file/directory transfers more easilly with... rsync [options] remote:'file1 file2 dir1 dir2' . -------------------------------------------------------------------------------- Remote Commands in Rync ;-) Some people may not be aware that you can do some tricky things with remote shell expansion in rsync. The bit after the colon is passed to the remote shell for expansion, which allows you to not only use wildcards, but also to use remote programs to specify which files to transfer (this doesn't work with a rsync daemon as there is no remote shell). For example: rsync -avze ssh rana:'`find transfer/ -name *.gz`' . would transfer all *.gz files from the transfer/ directory on the remote system to the current directory. That's a fairly pointless example but remember that you can use _any_ options that find accepts, or even pass the result through a pipe like this: rsync -avze ssh \ rana:'`find transfer/ -user tridge -name *.gz | grep smb`' \ /tmp/ Many of the fancier things people have been asking me to put in rsync lately can be done like this. For more complex cases you can write a shell script at the other end that outputs a file list, then just call that shell script as part of the rsync process, for example: rsync -avze ssh rana:'`bin/mylist`' /dest/ would run the command "bin/mylist" on the other end and use the resulting file list as the list of files to transfer. -------------------------------------------------------------------------------- Specifing the location of the remote "rsync" command. Depending on the remote accounts shell, and the method used to contact the remote account, the path to the rsync command (or for other things like the xterm) may not be defined. The solution is to... Define location remote server in the PATH environment (prefered)... Login Shell... * Set default path in ".cshrc" or other shell `dot' file that will be automatically read before executing a remote command. WARNING: not all shells read a `dot' file Ssh Communications... * Set in ".ssh/environment" file (local or remote?) * Add shell Commands ".ssh/rc" on the remote machine System Wide... * Set in in the remote machines global /etc files EG: "/etc/environment" or "/etc/default/login" or equivelent * set in the ssh config files "/etc/ssh_config" or "/etc/sshrc" (local or remote machine?) As part of rsync command... * Use the --rsync-path giving posible locations rsync MAY be located in. * Compile in a default --rsync-path into the local rsync Best idea is to use the defaults on the remote system and update those defaults if posible. When that fails then fall back to using a --rsync-path command line option. -------------------------------------------------------------------------------- Making incremental backups with rsync... Moved to "rsync_backup.hints" -------------------------------------------------------------------------------- Includes and Excludes A Good example The new rsync command (now uses in and excludes and "/" as src) : $RSYNC \ -va --delete --delete-excluded \ --exclude-from="$EXCLUDES" \ --include-from="$INCLUDES" \ / $SNAPSHOT_RW/home/daily.0 ; Exclude file (the stuff we dont want) : #rsync script exclude file **/.pan/messages/cache/ **/.phoenix/default/*/Cache/ **/.thumbnails/ **/Desktop/Trash/ Include (what dirs to be included): #rsync script include file /home/ /home/** /var/ /var/www/ /var/www/** /etc/ /etc/** - * Note the "- *" for excluding everything except the dirs mentioned in the include file. Also note the "/var/" entry. To backup /var/www/* , you need to include the parent directory /var/. The last like takes care of other sub-directories of /var/. ---- If you're going to have a lot of them, you'll probably be better off using an --include-from file. If you end with an --exclude "*", be sure to include every parent directory of files that you want to include, or the files below those directories will be ignored. It's also safest to start all the include patterns with "/" to make sure they match the beginning of paths; otherwise the patterns may match the end of some other pathnames. You also need to include "./" because apparently that is the default top-level directory name. For example: rsync -r --exclude-from exclude_file remote_host:sub1 local_dir with an exclude_file that contains + ./ + /sub1/ + /sub1/sub2/ + /sub1/sub2/file1 + /sub1/sub2/file2 - * will retrieve only files sub1/sub2/file1 and sub1/sub2/file2. I didn't precede the "./" with a "/" because in the above example it is actually "/sub1/./" that would be needed if a complete path were given. If you drop the "sub1" from the command line above, this exclude_file still works. The patterns are processed in order and as soon as it hits the exclude * it will stop looking. Also you do have to include all parent directories in the include too. Alturnative... do a --include $FILES_TO_TRANSFER then add a --exclude '*' It turns out that if you have no wildcards in your includes and an exclude '*' at the end, you will trigger an optimization in which the files are directly opened and most of the include/exclude processing is skipped. A side effect of this is that you don't actually need to include the parent directories, although it might not be a good idea to depend on that feature. --- This fails to to distribute the userf/fsubdir directory + /usera/ + /userf/fsubdir/ - /userf/* - /* It seems that rsync does not compare every file in the source to its inc/exc list. That is, once it finds a directory excluded, it doesn't then test subdirectories. So in this case, it checked /users/userf, which didn't match "+ /userf/fsubdir/" but did match "- /*". Thus it didn't ever try /users/userf/fsubdir. To fix this, I did: + /usera/ + /userf/ + /userf/fsubdir/ - /userf/* - /* rsync tests /users/userf and finds the "+ /userf" so it creates that directory at the receiving end. It then tests /users/userf/fsubdir and finds "+ /userf/fsubdir/" and creates that and copies the included files. Anything else in /users/userf is excluded by the "- /userf/*". Essentually you need to include sub-directorys and files, then exclude the parts you don't want. --- You need to know that rsync applies each pattern both to individual name components as it visits them and to the entire paths at that point. So an exclude of ".*" will exclude all dot files in each directory but it will not exclude a subdirectory under them which doesn't begin with dot. So if you say + .netscape/ - .* you will exclude things like .file sub/.file2 sub1/sub2/.file3 .netscape/.file4 but you will not exclude .netscape/file5 .netscape/dir1 because of your "+ .netscape/" So for top level always start with / but for any file of that name in any sub-directory don't use / -------------------------------------------------------------------------------- Delete before or after? The --delete option will recovers disk space before starting transfers, whcih can be important when space is tight, however it has to scan the file system once before it even started transfers. that is two file system scans are needed. The --delete-after does the deletes after the transfers have finished, and as such rsync has already completed the main file system scan. As such it has a better performance. -------------------------------------------------------------------------------- Can you delete source files after transfered ? Not automatically. But I've done things like: h=mailserver d=Maildir ( rsync -vaze ssh $h:$d/. $d/ 2>&1 | \ perl -lne 'print "'$d'/$_" if /^\S+[^\s\/]$/' ) | ssh $h perl -lne unlink I.e. do an rsync for the pull, tear the names out of the "rsync -v" output, and pipe 'em back over another ssh to delete the files. Bennett Todd WARNING: Rsync -v lists files that ARE being updated not which HAVE been update! It also lists directorys before files (for directory creation). As such if rsync aborts half way though a file, you could delete a file which was NOT FULLY TERNSFERED. Also the current file being transfered my not be the last listed as directories and file permissions may be updated in parallel. NOTE: the option --delete-after just delays the deletion off the temporary files on the *destination* side (of files not found on the source side) until after all the files have been transferred. It does NOT delete the file from the *source* side. You could to patch the source with a "--move-files" option to actually move files between machines. Wayne Davison -------------------------------------------------------------------------------- Setting up a secure ssh-rsync daemon on a remote machine. From rsync@samba.anu.edu.au Fri Dec 19 05:11:04 1997 From: reynhout@quesera.com Subject: Re: Using rsync with sshd "command="? Date: Fri, 19 Dec 1997 06:10:41 +1100 > I'd like to use rsync with the sshd facility of restricting a given > public key to a specific command with the "command=" facility of > authorized_keys. This is what I do to make this work: thalia is my desktop, with a big disk array. talulah is a remote box that I need to keep synchronized in case of disaster. >From thalia, I run rsync with the shell set to ssh_wrapper, which decides how to reach talulah, and runs ssh with the appropriate args (some of the remote boxes I use this for are behind an SSL proxy.) On talulah, sshd is hardwired to run rsync_wrapper, which examines the SSH_ORIGINAL_COMMAND environment variable, verifies that it begins with the full path of the rsync executable, and contains only normal, non-meta characters. In quick-hack perl: $line=$ENV{SSH_ORIGINAL_COMMAND}; if ( $line =~ /^\/opt\/bin\/rsync --server --sender / ) { # this regexp will need tweaking to handle unusual # (but legal) characters in paths. eg: [_\.] ($safeline=$line) =~ s|[^\w\s\d\-\/]||g; if ( $line ne $safeline ) { exit 1; } system("$line"); } else { exit 1; } If all these tests are passed, I just run SSH_ORIGINAL_COMMAND. If any are not passed, I exit. I used to print a diagnostic, but rsync on thalia just realized that it isn't getting what it wanted and said "Incorrect version information. Is your shell clean?" or something like that. It's important that these wrapper scripts DO NOT OUTPUT ANYTHING, even debugging info, status info, etc. Otherwise rsync will be interfered with. **It would be great if rsync would instead say: ** ** Bad response from remote rsync instance (is your shell clean?): ** ** **or similar. But for now I just trap the error code. In this manner, I can change the arguments passed to rsync, the directories, etc without having to worry about how it appears when it hits talulah. Fwiw, the command-line I end up running (right now) is: /opt/bin/rsync --server --sender -vnlHogDtprI --delete / From a security perspective, I'm relying on my wrapper script (on talulah) catching any clever attempts. I'm also allowing incoming ssh ONLY from thalia (for this user) and placing limitations on who has login (and physical) access to thalia. Subversion would require knocking thalia off the net, and getting thalia's private key. Even that should only allow you to get a copy of arbitrary files on talulah. It shouldn't allow any sort of shell access or execution of arbitrary commands. Anyone have any comments? D Andrew Reynhout reynhout@quesera.com "You've got your whole life to do something, reynhout@milkcrate.com and that's not very long..." -ani difranco -------------------------------------------------------------------------------- Network Connection Command You can pass arguments to the network connection command, For example... rsync {options} --rsh 'ssh -x' {files} Will run ssh WITHOUT it setting up the unneeded X windows communications. However rsync's parseing of the --rsh option is a highly simplistic "space" parsing. As such... rsync {options} --rsh 'ssh -x -o "BatchMode yes"' {files} will NOT work as expected. Ssh will receive the options '"BatchMode' and 'yes"', neither of which are valid options. As a work around in this case, most ssh programs will accept a "=" instead of a space between option and its argument. (Only openssh-2.1 does not seem to allow this). As such for ssh you can aviod the problem with... rsync {options} --rsh 'ssh -x -o BatchMode=yes' {files} -------------------------------------------------------------------------------- RSync Limitations... Since rsync builds in memory and transfers the whole file list before it starts moving any files, a big tree causes (a) a large memory requirement (b) a long delay before any files starts moving. The amount of data is basically irrelevant for memory usage. The thing that matters is the total number of files. rsync will use about 80 bytes per file at each end (this is very rough, take it as a rule of rule of thumb only). This is one of the bad consequences of developing rsync on a machine with 256MB of ram :) -- Tridge I've thought about this as well, and even toyed with writing a simple script that would call rsync for each top level directory in a similarly large tree -- but if a tree is relatively small this might actually make things slower. -- rsync user Alternative... "Unison" caches the directory information before hand, as such it knows about file deletions and even directory moves. It also does not have to scan directories so heavily so tends to be a lot faster that rsync on large directory synchronization. However caching is only useful for regular file coping, and not so useful for once of coping, without some final cleanup of the cached data. Its purpose remember is for bi-directional synchronization, NOT uni-directional copying. However it does have flags to make it work in one direction only. --------------------------------------------------------------------------------