The macosxhints Forums

The macosxhints Forums (http://hintsforums.macworld.com/index.php)
-   UNIX - General (http://hintsforums.macworld.com/forumdisplay.php?f=16)
-   -   rsync update - get a different byte count (http://hintsforums.macworld.com/showthread.php?t=125497)

Allasso 06-23-2011 07:48 AM

rsync update - get a different byte count
 
Hello,

I recently began using rsync to make incremental backups of a 10 GB archive. However, I have observed that I after the update is finished, I get a different byte count on the target than the source. When I just do a brute-force overwrite using finder, I always get the same byte count.

This is true whether I check using du or the finder info window.

Comparing the archives with diff shows no difference between the two.

Anyone know why I get the different reports on byte counts?

tlarkin 06-23-2011 12:02 PM

Have you done a diff, md5 or sha1 comparison of the source and destination of the sync yet? My guess is meta data is different or not preserved, but that is a guess.

Hal Itosis 06-23-2011 01:57 PM

Any multi-linked files in there?

Allasso 06-23-2011 02:35 PM

Quote:

Originally Posted by Hal Itosis (Post 627514)
Any multi-linked files in there?

how would I tell that?

I don't really understand multi-linking files well, but I wonder if I were to flag "preserve hard links" if the count would be the same?

Allasso 06-23-2011 02:37 PM

Quote:

Originally Posted by tlarkin (Post 627487)
Have you done a diff, md5 or sha1 comparison of the source and destination of the sync yet?

the answer is in the op.

how would you check the meta-data?

Allasso 06-23-2011 02:42 PM

Quote:

Originally Posted by Allasso (Post 627518)
I don't really understand multi-linking files well, but I wonder if I were to flag "preserve hard links" if the count would be the same?

Well I just tried -H (preserve hard links) and it doesn't make any difference - still get different counts

tlarkin 06-23-2011 03:28 PM

There is a binary in /usr/bin/GetFileInfo if you install developer tools on your Mac. That is probably the best way to compare meta data from files.

Also, rsync should have a switch to preserve extended attributes of a file. I am not sure really all what HFS+ does at the inode level off the top of my head, I would have to look it up.

Allasso 06-23-2011 05:23 PM

Quote:

Originally Posted by tlarkin (Post 627525)
My guess is meta data is different or not preserved, but that is a guess.

I compared outputs from GetFileInfo and you are correct, different meta data for some of the files (probably the ones updated?), namely, it dropped the type and creator.


Quote:

Originally Posted by tlarkin (Post 627525)
There is a binary in /usr/bin/GetFileInfo if you install developer tools on your Mac...

I found it in /Developer/Tools on my machine.

Thanks, your input has been very helpful.

I found this article that is helpful on the subject in general:

http://www.mactech.com/articles/mact...ata/index.html

Allasso 06-23-2011 06:43 PM

Quote:

Originally Posted by tlarkin (Post 627525)
Also, rsync should have a switch to preserve extended attributes of a file. I am not sure really all what HFS+ does at the inode level off the top of my head, I would have to look it up.

The switch is -E. I ran it, and the byte count is much closer now, the difference being +164 now, compared to -264,103 without -E (the synced archive was smaller, now it is slightly larger). This is for a ~ 10GB archive. I don't understand the slight difference though.

tw 06-23-2011 07:02 PM

Quote:

Originally Posted by Allasso (Post 627539)
The switch is -E. I ran it, and the byte count is much closer now, the difference being +164 now, compared to -264,103 without -E (the synced archive was smaller, now it is slightly larger). This is for a ~ 10GB archive. I don't understand the slight difference though.

If you download and install rsync 3.0.x it supposedly handles resource forks better than the 2.6.7 that ships with os X. Carefully check over your rsynck commands against the new man page, though - there were some significant changes to the options (e.g., they switched metadata preservation from -E to -X)

The slight differences are probably inconsequential - at best they are due to block-size differences, at worst to the loss of some metadata that you'll never miss.

Allasso 06-23-2011 11:41 PM

Quote:

Originally Posted by tw (Post 627542)
If you download and install rsync 3.0.x it supposedly handles resource forks better than the 2.6.7 that ships with os X. Carefully check over your rsynck commands against the new man page, though - there were some significant changes to the options (e.g., they switched metadata preservation from -E to -X)

The slight differences are probably inconsequential - at best they are due to block-size differences, at worst to the loss of some metadata that you'll never miss.

I found out that it was all the ._ files that weren't being copied without the -E switch that was the reason for the big disparity in byte count; the meta data was pretty inconsequential to this. Turning on the -E switch copies/updates* all those files.

I am wondering if it is necessary to back up the ._ files? I guess they are resource forks? - but I don't know much about what that is yet. I observed that it wants to update those every time I do a backup. Are those files really changing all the time?

Also, I am wondering how you view ._ files. They don't show up using ls -a. I never knew they existed until I started seeing them in the rsync report. What are they for?

*EDIT: It appears that rsync is creating ._ files, not copying them. ie, >f+++++++

ganbustein 06-24-2011 12:29 AM

The Macintosh uses all sorts of file attributes that are not supported elsewhere. The resource fork is one, but there's also file type, creator, assorted Finder info, and so forth. All of these (except the resource fork) are stored directly in the catalog on MFS (Macintosh File System, very obsolete), HFS (also obsolete) and HFS+. To support the ability to store Macintosh files on "foreign" filesystem as transparently as possible, the Mac uses a format called "AppleDouble", where one Mac file is stored as two. One of the files keeps the original's name, and holds only the data fork. The other has a name formed by prepending ._ to the original name, and contains all the other metadata.

When a Macintosh OS opens a file on one of these "foreign" filesystems, it automatically looks for and opens the matching ._ file. If it's not found, the data fork is still intact, but the resource fork and other metadata is lost. When such a pair of files is copied to an HFS-flavored filesystems, the two files are normally re-merged into one.

When any other OS opens a file, it has no clue that it should look for the ._ file, but it doesn't matter because the ._ file doesn't contain any information the other OS would understand anyway.

Usually, you only see the ._ files on a Mac filesystem when files are copied back using some utility that doesn't understand the convention. It sees them as two unrelated files and copies them that way. (And then, if you copy the files back to the "foreign" filesystem, you get a duplicate file error on the ._ files; the one that's generated as part of AppleDouble, and the one that shouldn't have been there but was and got copied.)

To see what to do about that (if you're curious), consult the man page for dot_clean(1).

Hal Itosis 06-24-2011 09:58 AM

Quote:

Originally Posted by Allasso (Post 627518)
how would I tell that?


find -x /path/to/folder -type f -links +1


For an example of a folder with multi-linked files, try ~/Library/Application\ Support

Allasso 06-24-2011 05:22 PM

Thanks much for your response, ganbustein.

Okay, I've been doing some reading on Mac resource forks. Now let me get this straight:

On MFS, HFS, or HFS+, each file may have a resource fork. But on those systems, the data fork and the resource fork are not in two separate files. I am not clear on whether the resource fork is part of the same file that the data fork is in, or if the resource fork is information that is stored on another part of the filesystem, maybe something like metadata or an inode. When rsync is transferring the files, it takes the resource fork information and puts it into its own file (._ prepended). Since I am transferring locally to another HFS drive, the resource information is then "recombined" with the data file by the Mac OS(?) either by placing it in some metadata/inode-like portion of the filesystem that associates it with the data file, or combining it with the data file itself - someone please tell me which. This accounts for why I cannot view any ._ files on my filesystem -- because they aren't there, even though rsync reports as transferring them. If it were transferring to another type of filesystem, it would simply write them to the disk as two separate files, one being the ._ file (that aggravates so many MS users) since this filesystem does not have a way of dealing with this extended metadata in the way HFS et al does.

Does this sound correct?

One of the things that has me confused is the terminology, "fork". When I hear the word "fork", I think of a process, not a file, and I am not able to draw an analogy between the two. It sounds to me like the resource fork is just another method of storing information. If my understanding of this is correct, the term "extended metadata" is more descriptive.

Allasso 06-24-2011 05:59 PM

Quote:

Originally Posted by Hal Itosis (Post 627629)

find -x /path/to/folder -type f -links +1


For an example of a folder with multi-linked files, try ~/Library/Application\ Support

Thank you. Incidentally, I ran the command on ~/Library/Application\ Support, and it didn't bring up any, but I read the man page on find, and I get the idea.

Allasso 06-24-2011 06:28 PM

I think my question in my last post regarding resource forks was a bit cryptic, and ill informed. I would like to state it more plainly:

Basically, I am wondering what exactly is a resource fork, and how is it stored on a Mac filesystem. I have read much information about how it is used and what it is for, how to access it, edit it, etc, but I can't seem to get a handle on what it is exactly, and how it is stored on the filesystem.

NOTE: I made the apparently incorrect analogy in that post about metadata being something like an inode, but I have just read that metadata is part of the data file itself.

ganbustein 06-24-2011 07:12 PM

Quote:

Originally Posted by Allasso (Post 627678)
One of the things that has me confused is the terminology, "fork". When I hear the word "fork", I think of a process, not a file, and I am not able to draw an analogy between the two.

Darn, now I'm going to have that mental image every time I hear the word!

But no, they're completely unrelated concepts inaptly tied to the same English word.

Let's start at the beginning. A filesystem is a method for keeping track of files. A file is some data, with some accompanying meta-information, the most important of which is the name of the file.

A typical filesystem is implemented as a huge database (called a catalog) containing catalog entries, each of which contains all the information about a file, except its data. Instead of putting the data in the file's catalog entry, the catalog contains only a pointer to where the data can be found. (This pointer can be quite complex, especially in the case of a file whose data has been fragmented, but conceptually all you need for an unfragmented file is the starting sector number for its data. The length of the data would be stored in a separate field of the catalog entry.)

In addition to the pointer to the data, the length of the data, and the file's name, the catalog entry would typically contain other metadata, such as creation date, last modify date, permissions, and so forth.

The Macintosh filesystems, starting with MFS and continuing thru HFS+, differ from the traditional model in that each catalog entry actually has two pointers-to-data and two length-of-data fields. It's like having two files in one. They're a single file, in that they share the same catalog entry and all the other metadata, but they're two files, in that each has its own data. The terminology we use is that it's a single file with two forks. The metaphor is a kitchen fork, with a single handle (the catalog entry) and multiple (in this case two) tines.

One fork is called the data fork, and contains an unstructured sequence of bytes, as would typically be stored in a file in any other filesystem. The other fork is called the resource fork, and its content is traditionally formatted as a database in a particular format.

The catalog entry also contains some Mac-specific file attributes, such as type/creator, in what's called Finder info. The GetFileInfo application will show you these Mac-specific attributes. A user cannot add extra metadata to the catalog entry, whose storage layout is frozen, but within reason, you can add as many arbitrary "file attributes" of your own choosing as you like as database entries in the resource fork.

The ._ file contains everything about a Mac file except the data fork and the filename. That includes not only the resource fork, but also all the Mac-specific catalog fields.

To complicate matters, some filesystems support more than the two forks. These are called "named forks". Apple has an implementation for a third fork, the "extended attributes" fork. This fork was partially supported in Tiger, and fully supported and extensively used in Leopard. I don't know if these extended attributes make their way into the ._ files, but the structure of AppleDouble is extensible enough to support them.

You can get a list of all the extended attributes of a file using the command:

xattr -l /path/to/file

In addition to all the attributes stored in the extended attributes fork, the command will also synthesize two artificial attributes: the entire resource fork, if the file has one, will be listed as one (huge) attribute named com.apple.ResourceFork, and the Mac-specific attributes from the catalog entry (type/creator, bundle, bozo, invisible, etc.), if they differ from the default, will be collected into a single com.apple.FinderInfo attribute. Output is in hex, with no pretty printing to speak of.

tw 06-24-2011 07:57 PM

Let me throw out the lowbrow explanation. Think of it as yogurt: In the US, certain brands of yogurt come with little side packets of extra goodness - fruit or flavoring or whatnot that you add to your yogurt when you open it. In Greece, that is (or at least was) entirely unheard of. So, an unfortunate Greek who found himself with the wrong kind of American yogurt would have his yogurt plus this little extra packet of completely incomprehensible stuff that he has no idea what to do with.

the data fork - what we conventionally think of as the file - is the yogurt. The resource fork is that little packet of extra Mac goodness. A ._ file is basically Windows and unix systems holding up that little extra packet like a Greek peasant, saying "WTF?"

This explanation brought to you by the letter ∏ and the number 1011011001.

Allasso 06-24-2011 09:25 PM

Quote:

Originally Posted by ganbustein (Post 627686)
Darn, now I'm going to have that mental image every time I hear the word!

But no, they're completely unrelated concepts inaptly tied to the same English word.

Let's start at the beginning...

Wow, thanks. You concisely put together all the vague clues I was getting from a whole lot of googling. It is very clear now.

The fork thing makes sense to me now too.

I had already started experimenting with xattr, and did this little experiment. Consider this:

I have two files, both are html files.

file_1.html - created in Smultron (text editor)
file_2.html - imported from somewhere.

In the finder, the file_1.html has a Smutron icon, and will open in Smultron when doubleclicked.

file_2.html has a Firefox icon (the default for all my html files) and will open in Firefox.

okay...

so I run xattr -l on each of them:

Code:


xattr -l file_1.html
file_1.html
        com.apple.FinderInfo    SMLdSMUL    # type and creator

xattr -l file_2.html
file_2.html                  # apparently no attributes

I also ran:

Code:


ls -l file_1.html/..namedfork/rsrc
-rw-r--r--  1 allasso  allasso  0 Jun 24 18:50 file_1.html/..namedfork/rsrc

ls -l file_2.html/..namedfork/rsrc
-rw-r--r--  1 allasso  allasso  0 Jun 24 18:50 file_2.html/..namedfork/rsrc

note the file sizes on each of these - zero

NOW...

I go into finder, and using the info window, change file_2.html to open up in Smultron. So you would think I would now get the same results as file_1.html? Nada.

Again, running:

Code:


xattr -l file_1.html

I instead get:

Code:


file_1.html
        com.apple.FinderInfo  ^D
        com.apple.ResourceFork  <0000, 0000, 0x01, 0000, 0000,...

        ...plus about 1.2 MB more of dump

it is interesting that the type and creator is not even stored in com.apple.FinderInfo now, it seems to have all been put in the resource fork.

now running:

Code:


ls -l file_2.html/..namedfork/rsrc
-rw-r--r--  1 allasso  allasso  191273 Jun 24 18:50 file_2.html/..namedfork/rsrc

200 kB of extra stored data for a 10 kB file, to accomplish what the first file (apparently) did with about 20 bytes? I wonder how many files on my system are like this? Seems like a big waste of disk space.

BTW, what is "bozo"?

Allasso 06-24-2011 09:28 PM

Quote:

Originally Posted by tw (Post 627692)
Let me throw out the lowbrow explanation...

This explanation brought to you by the letter ∏ and the number 1011011001.

Thanks :-)

I don't get the "letter ∏ and the number 1011011001" part though...

hayne 06-24-2011 10:15 PM

Quote:

Originally Posted by Allasso (Post 627702)
I don't get the "letter ∏ and the number 1011011001" part though...

It's a reference to Sesame Street: http://muppet.wikia.com/wiki/Brought..._Sesame_Street!

The ∏ recalls the Greek yoghurt story.
I don't think there's anything special about the number 1011011001 - maybe he meant to refer to 11011000001

tw 06-24-2011 10:39 PM

Quote:

Originally Posted by hayne (Post 627704)
It's a reference to Sesame Street: http://muppet.wikia.com/wiki/Brought..._Sesame_Street!

The ∏ recalls the Greek yoghurt story.
I don't think there's anything special about the number 1011011001 - maybe he meant to refer to 11011000001

I'm always impressed by how much smarter I seem when people make the effort to interpret my totally random cr@p. :rolleyes:

Hal Itosis 06-25-2011 02:15 AM

Quote:

Originally Posted by Allasso (Post 627680)
Incidentally, I ran the command on ~/Library/Application\ Support, and it didn't bring up any, but I read the man page on find, and I get the idea.

Guess you're not using any sync services then (MobileMe or using an iPhone/iPad/iPod).



Quote:

Originally Posted by Allasso (Post 627701)
it is interesting that the type and creator is not even stored in com.apple.FinderInfo now, it seems to have all been put in the resource fork.

Yeah, a Snow Leopard "improvement"... not (IMO). A 'usro' resource gets added.



Quote:

Originally Posted by Allasso (Post 627701)
now running:

Code:


ls -l file_2.html/..namedfork/rsrc
-rw-r--r--  1 allasso  allasso  191273 Jun 24 18:50 file_2.html/..namedfork/rsrc


Note that using the -@ option with ls is a more convenient way of getting the sizes of files plus attributes:

ls -l@ /path/to/whatever



Quote:

Originally Posted by Allasso (Post 627701)
200 kB of extra stored data for a 10 kB file, to accomplish what the first file (apparently) did with about 20 bytes? I wonder how many files on my system are like this? Seems like a big waste of disk space.

I agree, it's more wasteful than the former com.apple.FinderInfo location (only 32 bytes). As far as how many, i think it's just the ones on which the user tweaks Finder's "Open WIth..." settings.



Quote:

Originally Posted by Allasso (Post 627701)
BTW, what is "bozo"?

bozo bit

ganbustein 06-25-2011 05:34 PM

Quote:

Originally Posted by Allasso (Post 627701)
I have two files, both are html files.

file_1.html - created in Smultron (text editor)
file_2.html - imported from somewhere.

In the finder, the file_1.html has a Smutron icon, and will open in Smultron when doubleclicked.

file_2.html has a Firefox icon (the default for all my html files) and will open in Firefox.

I don't recall if you mentioned which version of the OS you're using, but I can tell from this (and the evidence that neither file has a non-empty resource fork) that it's not Snow Leopard. The only difference between them is type/creator, which Snow Leopard completely ignores.

The original use of these was:
  • A "creator" code is a 32-bit value that is intended to uniquely identify an application. The 32 bits are usually meant to be interpreted as 4 printable bytes. To guarantee uniqueness, the application developer would register their code with Apple before shipping the application.
  • A "type" code is a 32-bit value that identifies the format of the file. Type codes are not required to be unique, and in fact if the format is one that multiple applications can understand, they should strive to use the same type code. Like creator codes, a type code is usually thought of as 4 printable bytes.
  • Certain type codes are "well known". For example, the type code for a plain text file is "TEXT", and for an application "APPL". I won't bore you with the full list of well-known type codes.
  • Each application contains a "bundle", in which it identifies its registered creator code, a list of type codes for the formats it knows how to read, and an icon associated with each of those.
  • When Finder wants to show an icon, it looks for the application with the same creator code, and looks in that application's bundle for the icon that matches the type code. If it can't find the application, or if the application doesn't define an icon for that type, then it uses a generic icon for that type.
  • Except that a document can contain optional icon resources, that override the icon from the application.
  • When you double-click a document, the application with the same creator code is the one that opens it. Applications typically put their own creator code in all documents they create (but can put in another application's creator code if they want that application to assume ownership of the file).
  • Except that a document can contain an optional 'usro'(0) resource, whose value is an alias record pointing to an application. That overrides the creator code in determining which application to use.
  • When you drag a document onto an application's icon, the application will accept the drag only if the type is one that the application's bundle says it can read. (There is a wild-card type that can be used to say the application can open any file type.)
OS X has a much more sophisticated way to accomplish all these objectives, and type/creator have only been grudgingly considered. Snow Leopard finally cut the apron strings, and completely ignores type/creator.

It still does not ignore icon resources or 'usro' resources. When you choose "open with..." and do not say "Apply to all documents of this type", a 'usro'(0) resource is added to the document, creating a resource fork to hold it if it's not already there. ('usro'=User Open).
Quote:

Originally Posted by Allasso (Post 627701)
I go into finder, and using the info window, change file_2.html to open up in Smultron. So you would think I would now get the same results as file_1.html? Nada.

...

I instead get:

Code:


file_1.html
        com.apple.FinderInfo  ^D
        com.apple.ResourceFork  <0000, 0000, 0x01, 0000, 0000,...

        ...plus about 1.2 MB more of dump

it is interesting that the type and creator is not even stored in com.apple.FinderInfo now, it seems to have all been put in the resource fork.

There's no "moving" of type/creator to the resource fork. They have a hard-coded position in the catalog entry itself. The file never had a (non-zero) type/creator, and still doesn't need one. The 'usro' resource overrides creator, and any icon resources will override the type/creator combination. An OS X application is expected to be able to figure out the type of a file without needing an explicit type code (by, for example, looking at the filename extension).

Because type/creator are now being overridden, they can't be relied on to determine the correct icon.To preserve the icon, the file was given a custom icon, probably as an 'icns' resource. The 'has a custom icon' Finder flag also had to be turned on. (FinderInfo starts with 4 bytes of creator, 4 bytes of type, and 2 bytes of Finder flags. 0400 is the bit in Finder flags that indicates a custom icon.) 'icns' resources are huge.

If you want to see what's in the resource fork, you can use:

# Just list the resources and their sizes.
RezDet -list file

# List the resources and their values
DeRez file

Both commands accept a -useDF parameter to say the file's resources are in its data fork instead of in its resource fork, but still in the format expected in a resource fork. (Such a file would typically have a .rsrc filename extension.) This is to allow resources on filesystems that don't support resource forks, at the expense of having to store data and resources in separate files.

Each resource is uniquely identified by a 4-byte "resource type" and a 16-bit "resource id", conventionally written as 'type'(id). Each has an optional "resource name" and 8 bits worth of "resource attributes", which may be written between the id and the closing paren, as in 'type'(id,"name",attributes).


All times are GMT -5. The time now is 06:02 PM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
Site design © IDG Consumer & SMB; individuals retain copyright of their postings
but consent to the possible use of their material in other areas of IDG Consumer & SMB.