![]() |
rsync update - get a different byte count
Hello,
I recently began using rsync to make incremental backups of a 10 GB archive. However, I have observed that I after the update is finished, I get a different byte count on the target than the source. When I just do a brute-force overwrite using finder, I always get the same byte count. This is true whether I check using du or the finder info window. Comparing the archives with diff shows no difference between the two. Anyone know why I get the different reports on byte counts? |
Have you done a diff, md5 or sha1 comparison of the source and destination of the sync yet? My guess is meta data is different or not preserved, but that is a guess.
|
Any multi-linked files in there?
|
Quote:
I don't really understand multi-linking files well, but I wonder if I were to flag "preserve hard links" if the count would be the same? |
Quote:
how would you check the meta-data? |
Quote:
|
There is a binary in /usr/bin/GetFileInfo if you install developer tools on your Mac. That is probably the best way to compare meta data from files.
Also, rsync should have a switch to preserve extended attributes of a file. I am not sure really all what HFS+ does at the inode level off the top of my head, I would have to look it up. |
Quote:
Quote:
Thanks, your input has been very helpful. I found this article that is helpful on the subject in general: http://www.mactech.com/articles/mact...ata/index.html |
Quote:
|
Quote:
The slight differences are probably inconsequential - at best they are due to block-size differences, at worst to the loss of some metadata that you'll never miss. |
Quote:
I am wondering if it is necessary to back up the ._ files? I guess they are resource forks? - but I don't know much about what that is yet. I observed that it wants to update those every time I do a backup. Are those files really changing all the time? Also, I am wondering how you view ._ files. They don't show up using ls -a. I never knew they existed until I started seeing them in the rsync report. What are they for? *EDIT: It appears that rsync is creating ._ files, not copying them. ie, >f+++++++ |
The Macintosh uses all sorts of file attributes that are not supported elsewhere. The resource fork is one, but there's also file type, creator, assorted Finder info, and so forth. All of these (except the resource fork) are stored directly in the catalog on MFS (Macintosh File System, very obsolete), HFS (also obsolete) and HFS+. To support the ability to store Macintosh files on "foreign" filesystem as transparently as possible, the Mac uses a format called "AppleDouble", where one Mac file is stored as two. One of the files keeps the original's name, and holds only the data fork. The other has a name formed by prepending ._ to the original name, and contains all the other metadata.
When a Macintosh OS opens a file on one of these "foreign" filesystems, it automatically looks for and opens the matching ._ file. If it's not found, the data fork is still intact, but the resource fork and other metadata is lost. When such a pair of files is copied to an HFS-flavored filesystems, the two files are normally re-merged into one. When any other OS opens a file, it has no clue that it should look for the ._ file, but it doesn't matter because the ._ file doesn't contain any information the other OS would understand anyway. Usually, you only see the ._ files on a Mac filesystem when files are copied back using some utility that doesn't understand the convention. It sees them as two unrelated files and copies them that way. (And then, if you copy the files back to the "foreign" filesystem, you get a duplicate file error on the ._ files; the one that's generated as part of AppleDouble, and the one that shouldn't have been there but was and got copied.) To see what to do about that (if you're curious), consult the man page for dot_clean(1). |
Quote:
find -x /path/to/folder -type f -links +1 For an example of a folder with multi-linked files, try ~/Library/Application\ Support |
Thanks much for your response, ganbustein.
Okay, I've been doing some reading on Mac resource forks. Now let me get this straight: On MFS, HFS, or HFS+, each file may have a resource fork. But on those systems, the data fork and the resource fork are not in two separate files. I am not clear on whether the resource fork is part of the same file that the data fork is in, or if the resource fork is information that is stored on another part of the filesystem, maybe something like metadata or an inode. When rsync is transferring the files, it takes the resource fork information and puts it into its own file (._ prepended). Since I am transferring locally to another HFS drive, the resource information is then "recombined" with the data file by the Mac OS(?) either by placing it in some metadata/inode-like portion of the filesystem that associates it with the data file, or combining it with the data file itself - someone please tell me which. This accounts for why I cannot view any ._ files on my filesystem -- because they aren't there, even though rsync reports as transferring them. If it were transferring to another type of filesystem, it would simply write them to the disk as two separate files, one being the ._ file (that aggravates so many MS users) since this filesystem does not have a way of dealing with this extended metadata in the way HFS et al does. Does this sound correct? One of the things that has me confused is the terminology, "fork". When I hear the word "fork", I think of a process, not a file, and I am not able to draw an analogy between the two. It sounds to me like the resource fork is just another method of storing information. If my understanding of this is correct, the term "extended metadata" is more descriptive. |
Quote:
|
I think my question in my last post regarding resource forks was a bit cryptic, and ill informed. I would like to state it more plainly:
Basically, I am wondering what exactly is a resource fork, and how is it stored on a Mac filesystem. I have read much information about how it is used and what it is for, how to access it, edit it, etc, but I can't seem to get a handle on what it is exactly, and how it is stored on the filesystem. NOTE: I made the apparently incorrect analogy in that post about metadata being something like an inode, but I have just read that metadata is part of the data file itself. |
Quote:
But no, they're completely unrelated concepts inaptly tied to the same English word. Let's start at the beginning. A filesystem is a method for keeping track of files. A file is some data, with some accompanying meta-information, the most important of which is the name of the file. A typical filesystem is implemented as a huge database (called a catalog) containing catalog entries, each of which contains all the information about a file, except its data. Instead of putting the data in the file's catalog entry, the catalog contains only a pointer to where the data can be found. (This pointer can be quite complex, especially in the case of a file whose data has been fragmented, but conceptually all you need for an unfragmented file is the starting sector number for its data. The length of the data would be stored in a separate field of the catalog entry.) In addition to the pointer to the data, the length of the data, and the file's name, the catalog entry would typically contain other metadata, such as creation date, last modify date, permissions, and so forth. The Macintosh filesystems, starting with MFS and continuing thru HFS+, differ from the traditional model in that each catalog entry actually has two pointers-to-data and two length-of-data fields. It's like having two files in one. They're a single file, in that they share the same catalog entry and all the other metadata, but they're two files, in that each has its own data. The terminology we use is that it's a single file with two forks. The metaphor is a kitchen fork, with a single handle (the catalog entry) and multiple (in this case two) tines. One fork is called the data fork, and contains an unstructured sequence of bytes, as would typically be stored in a file in any other filesystem. The other fork is called the resource fork, and its content is traditionally formatted as a database in a particular format. The catalog entry also contains some Mac-specific file attributes, such as type/creator, in what's called Finder info. The GetFileInfo application will show you these Mac-specific attributes. A user cannot add extra metadata to the catalog entry, whose storage layout is frozen, but within reason, you can add as many arbitrary "file attributes" of your own choosing as you like as database entries in the resource fork. The ._ file contains everything about a Mac file except the data fork and the filename. That includes not only the resource fork, but also all the Mac-specific catalog fields. To complicate matters, some filesystems support more than the two forks. These are called "named forks". Apple has an implementation for a third fork, the "extended attributes" fork. This fork was partially supported in Tiger, and fully supported and extensively used in Leopard. I don't know if these extended attributes make their way into the ._ files, but the structure of AppleDouble is extensible enough to support them. You can get a list of all the extended attributes of a file using the command: xattr -l /path/to/file In addition to all the attributes stored in the extended attributes fork, the command will also synthesize two artificial attributes: the entire resource fork, if the file has one, will be listed as one (huge) attribute named com.apple.ResourceFork, and the Mac-specific attributes from the catalog entry (type/creator, bundle, bozo, invisible, etc.), if they differ from the default, will be collected into a single com.apple.FinderInfo attribute. Output is in hex, with no pretty printing to speak of. |
Let me throw out the lowbrow explanation. Think of it as yogurt: In the US, certain brands of yogurt come with little side packets of extra goodness - fruit or flavoring or whatnot that you add to your yogurt when you open it. In Greece, that is (or at least was) entirely unheard of. So, an unfortunate Greek who found himself with the wrong kind of American yogurt would have his yogurt plus this little extra packet of completely incomprehensible stuff that he has no idea what to do with.
the data fork - what we conventionally think of as the file - is the yogurt. The resource fork is that little packet of extra Mac goodness. A ._ file is basically Windows and unix systems holding up that little extra packet like a Greek peasant, saying "WTF?" This explanation brought to you by the letter ∏ and the number 1011011001. |
Quote:
The fork thing makes sense to me now too. I had already started experimenting with xattr, and did this little experiment. Consider this: I have two files, both are html files. file_1.html - created in Smultron (text editor) file_2.html - imported from somewhere. In the finder, the file_1.html has a Smutron icon, and will open in Smultron when doubleclicked. file_2.html has a Firefox icon (the default for all my html files) and will open in Firefox. okay... so I run xattr -l on each of them: Code:
Code:
NOW... I go into finder, and using the info window, change file_2.html to open up in Smultron. So you would think I would now get the same results as file_1.html? Nada. Again, running: Code:
Code:
now running: Code:
BTW, what is "bozo"? |
Quote:
I don't get the "letter ∏ and the number 1011011001" part though... |
Quote:
The ∏ recalls the Greek yoghurt story. I don't think there's anything special about the number 1011011001 - maybe he meant to refer to 11011000001 |
Quote:
|
Quote:
Quote:
Quote:
ls -l@ /path/to/whatever Quote:
Quote:
|
Quote:
The original use of these was:
It still does not ignore icon resources or 'usro' resources. When you choose "open with..." and do not say "Apply to all documents of this type", a 'usro'(0) resource is added to the document, creating a resource fork to hold it if it's not already there. ('usro'=User Open). Quote:
Because type/creator are now being overridden, they can't be relied on to determine the correct icon.To preserve the icon, the file was given a custom icon, probably as an 'icns' resource. The 'has a custom icon' Finder flag also had to be turned on. (FinderInfo starts with 4 bytes of creator, 4 bytes of type, and 2 bytes of Finder flags. 0400 is the bit in Finder flags that indicates a custom icon.) 'icns' resources are huge. If you want to see what's in the resource fork, you can use: # Just list the resources and their sizes. RezDet -list file # List the resources and their values DeRez file Both commands accept a -useDF parameter to say the file's resources are in its data fork instead of in its resource fork, but still in the format expected in a resource fork. (Such a file would typically have a .rsrc filename extension.) This is to allow resources on filesystems that don't support resource forks, at the expense of having to store data and resources in separate files. Each resource is uniquely identified by a 4-byte "resource type" and a 16-bit "resource id", conventionally written as 'type'(id). Each has an optional "resource name" and 8 bits worth of "resource attributes", which may be written between the id and the closing paren, as in 'type'(id,"name",attributes). |
| All times are GMT -5. The time now is 06:02 PM. |
Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
Site design © IDG Consumer & SMB; individuals retain copyright of their postings
but consent to the possible use of their material in other areas of IDG Consumer & SMB.