Friday, March 13, 2009

Cygwin Rsync over SSH Hang - Solution

It is well known that running rsync over SSH to pull data from a windows cygwin host hangs and there is no good workaround but to not use SSH PIPES. Well, the not USE_PIPES workaround is not so good since it fails sometimes and the major drawback is the speed transfer which is about 30-40% less then using a normal SSH pipe.

My solution addresses both security and transfer speed and there is no need to compile or modify any software whatsoever.

The idea is to use SSH tunnel with port forwarding.
Lets say we have the following hosts:

Host_A: Windows Cygwin Host, running SSH server and has the data to be backed up
Host_B: Linux Host, The backup server where the backup scripts are running

We want to initiate a command on Host_B that will pull data from Host_A. Normally, you will do this by running:

Host_B# rsync ---verbose --stats --progress --rsh=ssh HOST_A: /backups/

That command will hang after a while. Here is how to do it with my workaround:

1) Install Rsync Daemon (as a Windows service) on Host_A:
Host_A# cygrunsrv.exe -I "Rsync" -p /cygdrive/c/cygwin/bin/rsync.exe -a "--config=/cygdrive/c/cygwin/etc/rsyncd.conf --daemon --no-detach" -f "Rsync daemon service"

2) Create /etc/rsyncd.conf on Host_A
----------
use chroot = false
strict modes = false
address = 127.0.0.1

[data]
path = /cygdrive/c/
comment = data
-----------
Note: I used 127.0.0.1 the address to bind to be more secured. No other hosts will connect to rsync server other then the host itself.

3) Verify your rsync server
Host_A# rsync rsync://localhost/data/

4) Lets create the SSH tunnel
Host_B# ssh -L 1234:127.0.0.1:873 root@Host_A
It prompts for Host_A password

Check if you have port 1234 binded on Host_B. If not, something went wrong with previouse command

Host_B# netstat -lnp | grep 1234

5) Check if you can see the remote rsync server on local Host_B port 1234
Host_B# rsync rsync://localhost:1234/data/

6) Proceed with the backup command
Host_B# rsync --verbose --stats --progress --recursive rsync://localhost:1234/data/ /backups//

It will not hang and you will get all the data.

More then that, if you are running rsync the second time, for the same data, then it will be much faster then doing it as you normally do with --rsh=ssh option.

Still have problems ? No, I don't think you will...but if so - please let me know

4 comments:

  1. Nice work. An elegant solution to a painful problem. I wonder why this hasn't been discovered before? I wonder why this makes a difference?

    ReplyDelete
  2. This has been discovered before but nobody wrote a step-by-step guide for this problem. I'm glad I can do some help.

    Dinu

    ReplyDelete
  3. Dinu,

    You do not need to run rsync as a daemon.
    Download the rsync source from cygwin.
    (using setup.exe)

    ...
    In short:
    Compile rsync from source and use pipes instead of socketpair.

    # To do this:
    cd /usr/src/rsync-3.0.6/
    ./prepare-source
    ./configure

    # Now undefine HAVE_SOCKETPAIR in config.h
    # Search for /HAVE_SOCKETPAIR
    # Comment out
    # #define HAVE_SOCKETPAIR 1
    # //#define HAVE_SOCKETPAIR 1
    vi config.h
    // #define HAVE_SOCKETPAIR 1

    make
    make install

    # Instead of overwriting the original executable
    # at /bin/rsync use the compiled executable
    # (/usr/local/bin/rsync.exe) in your backup scripts.
    # This way, even if /bin/rsync gets overwritten
    # by cygwin setup.exe, we are still using the good
    # rsync version compiled from source.

    See
    http://www.mail-archive.com/cygwin@cygwin.com/msg82514.html

    The hang is occuring when rsync is attempting to exchange protocol version numbers, it writes its version and then hangs waiting endlessly for a reply. The ssh process is detached, and apparently not diretly connected to rsync, as it is an orphan, owned by process 1. The ssh process never even gets as far as attempting network access and rsync
    never does anything of value.

    Not defining HAVE_SOCKETPAIR in the make configuration is enough to use pipe() vs. socketpair(), which is enough for rsync to run using ssh, just as it does on UNIX. While I've not done extensive testing, I've used it enough to believe it is working as designed/intended.

    The source file that is affected by the above change (primarily) is
    util.c in function: fd_pair():

    #ifdef HAVE_SOCKETPAIR
    ret = socketpair(AF_UNIX, SOCK_STREAM, 0, fd);
    #else
    ret = pipe(fd);
    #endif

    While use of socketpair() may be a better method, use of pipe() does work consistently without hanging.
    ...

    ReplyDelete
  4. For those of us better acquainted with Linux than Cygwin: cygrunsrv.exe doesn't actually start the service. To start it, use "net start rsync".

    Creating the tunnel directly with ssh requires a password, and leaves you logged in to the distant host - okay if you're at the keyboard, but not so good for a script. "screen" can be used to start ssh in a separate process. I also wanted the backed up files to be owned by a normal user. Based on pointers here and at
    http://sourceforge.net/apps/mediawiki/backuppc/index.php?title=Workaround_BackupPC_Windows_2003_Hang,
    I suggest the following:

    First, set up ssh keys so this executes without asking for a password:

    joe@Host_B$ ssh Host_A date

    Then, create the tunnel like this:

    /usr/bin/screen -d -m -S mytunnel$$ /usr/bin/ssh -i ~joe/.ssh/id_rsa -x -L 1234:127.0.0.1:873 joe@Host_A
    /bin/sleep 5

    (the -d -m combination starts ssh in a separate process, the -S option gives it a relatively unique name we can use below, and the -i option supplies ID information that eliminates typing in a password.)

    Then the rsync command looks something like this:

    (su joe -c "nice rsync --verbose --archive --delete --delete-excluded --sparse --link-dest=../snapshot.1 \"rsync://localhost:1234/data/Documents and Settings/joe\" snapshot.0" ) || logger -s -t Host_B "rsync returns $#"

    (rsync runs as joe, so joe must have write permission for backup-dir, and joe will own all created files. This creates linked backups: snapshot.1 is the previous backup. For each unchanged file, snapshot.0 will just get a hard link to that file under snapshot.1. The source parameter for rsync has spaces, so it's in quotes. Those quotes are escaped because the whole rsync command is quoted.)

    Tear down the tunnel like this:

    /usr/bin/pkill -f mytunnel$$

    ReplyDelete