PDA

View Full Version : rsync, samba, UTF8, international characters, oh my!


balthisar
03-01-2009, 03:48 PM
I've wasted most of my day trying to get this to friggin' work. Turns out I lost half of my wife's iTunes songs because of this, and now I want to get it right. Sorry for this following wall of text.

Quick synopsis: I rsync a directory containing subdirectories with accented character names (e.g., containing the letter "") to a Debian Linux box from my Mac. The complete directory structure arrives there okay, verified by ssh'ing into the box and looking. Also, I can successfully do a restore via rsync.

The Problem: If I stop there, then I don't really have any problems. However when I mount the share from the Linux box, Finder shows all of the backed up directories that have accented characters as being empty, and doing a Finder copy back to the Mac only copies the empty folders. The files really do exist, Finder doesn't show them or even know about them. I successfully destroyed half of my wife's iTunes collection when I Finder-dragged my backup from the Linux box onto a new machine, and then rsync'd the now-empty directories on my Mac back on top of my backup and onto her iMac!

Research: I'm certain I have a character encoding problem somewhere, but I can't identify where. If I ssh into my Linux box, and rename the file to the same name (even with accents!), the directory contents become available in Finder. And because the mv operation succeeds, that's proof that the accented character really is different, I think. Let's use as an example. I'm fairly certain that rsync isn't using the ASCII , because on the Linux box with the locale set to C, it shows up in multibyte format, as ??. If I rename the file to have (normal Mac dead keys), the correct character shows up on the command line, but the directory listing still shows ??, so it's not an ASCII I assume, but a different multibyte character? If I change my locale on the Linux box to en_US.utf8 (the default), then the accent appears correct in the directory listing. Again, though, a mv works, so it's changing from one form of multibyte to another.

Current state:
My Linux locale is set to en_US.utf8.
My Mac locale is set to en_US.UTF-8.
In my Linux smb.conf, I've ensured that:
[global]
unix charset = UTF8
display charset = UTF8
I've upgrade my Linux and Mac rsync to 3.05, which support the --iconv parameter, so my rsync looks something like:
rsync -vazp --chmod=o-x --iconv=. --delete --rsh="ssh -l mythtv" /Users/Shared/Music/Test/ traseron:/mnt/sda1/Music/
I've also tried --iconv=UTF8-MAC,UTF8 and --iconv=- and --iconv=UTF8,UTF8.
Now I'm fairly certain that everything that can possibly be UTF8 is UTF8, but I still have the Finder problem. I'm rsyncing from a UTF8 system to a UTF8 system, sharing the directory as a UTF8 system, but yet somewhere, the accented characters are being encoded differently, which causes Finder to choke when mounted. I'm really at my wit's end!

Everything works perfectly when I rsync to and then mount my wife's Mac via samba.

Anyone, anywhere, have any idea what could be the problem?

Eric
03-31-2009, 05:22 AM
I had the same problem but I mount the shares with afp (running netatalk on RedHat 5.3 server) - if I use --iconv=UTF8-MAC flag it works correctly for filnames with swedish characters in them ( ), if not using this flag I get either empty folders or "dancing icons" when I mount the share in Finder.

So your solution might be to run rsync with --iconv=UTF8-MAC flag and to mount the volumes with afp instead of smb.

I tried to mount the same volume with smb and it looks fine.. Make sure samba is using unix charset = UTF8

Flags I used for rsync: /usr/local/bin/rsync -Pavb --iconv=UTF8-MAC,UTF8

Good luck!
Eric

balthisar
03-31-2009, 07:02 PM
I had the same problem but I mount the shares with afp (running netatalk on RedHat 5.3 server) - if I use --iconv=UTF8-MAC flag it works correctly for filnames with swedish characters in them ( ), if not using this flag I get either empty folders or "dancing icons" when I mount the share in Finder.

So your solution might be to run rsync with --iconv=UTF8-MAC flag and to mount the volumes with afp instead of smb.

I tried to mount the same volume with smb and it looks fine.. Make sure samba is using unix charset = UTF8

Flags I used for rsync: /usr/local/bin/rsync -Pavb --iconv=UTF8-MAC,UTF8

Good luck!
Eric
Thank you! I'll certainly try it! I'd still not gotten any response on this thread, so I'm glad you saw it in passing.

chadvonnau
03-18-2013, 09:36 PM
Thank you, balthisar and Eric. I have my files backing up cleanly thanks to you. I used --iconv=UTF8,UTF8-MAC since I am going the other way (backing up from linux to mac over NFS). Seems like this should be automatic with Mac rsync, but at least now I know. Thanks again.