The macosxhints Forums

The macosxhints Forums (http://hintsforums.macworld.com/index.php)
-   UNIX - General (http://hintsforums.macworld.com/forumdisplay.php?f=16)
-   -   tar: backing up a 240GB takes too long (http://hintsforums.macworld.com/showthread.php?t=81298)

cocotu 11-14-2007 12:13 PM

tar: backing up a 240GB takes too long
 
I'm backing up 240GB folder to an external HD with 232GB available space. I'm using this in my crontab:

00 03 * * * /usr/local/bin/tar cvzf /Volumes/backupdisk/projects.bak/projects.tar.gz /Volumes/DataHD/projects 2>&1 | tee /Users/bhadmin/projects.log

I upgraded to the lastest tar. Its 12:10PM now and its still running. Is there anything i'm doing wrong? Is there a way to make it better? Or this is the way it should be? I saw a backup script here:
http://www.faqs.org/docs/securing/chap29sec306.html

but not sure if it would work for me. thanks..

dzurn 11-14-2007 03:19 PM

Have you done this backup before? USB or Firewire connection? Drive specs?

Is it the tar taking forever, or the copying? Or are you just imagining it "should" be faster?

tlarkin 11-14-2007 03:22 PM

ever think of using rsync?

cocotu 11-14-2007 05:58 PM

can rsync compress things? if yes, what is the option for that? can you illustrate with a sample? thanks...

dzurn, to answer your questions:

1. Yes I have done this backup before.
2. Firewire connection.
3. LaCie external HD (250GB) if you need the model # let me know.
4. The copying is taking a long time NOT sure what do you mean "Is it the tar taking forever".
5. I wrote this post for the obvious reason: to find out if this behavior is normal or not.

thanks.

tlarkin 11-14-2007 06:07 PM

there is an archive option with rysnc, the -a option. So you could do something like this

Code:

rysnc -r -u -a /path/to/directory /path/to/network/backup
http://www.ss64.com/osx/rsync.html

acme.mail.order 11-14-2007 07:05 PM

How long is it taking? You say it's "12:10PM now and its still running" but you didn't mention the start time. I would expect it to take several hours but you probably want to know if it's frozen or not. Run `ls -l` command every few seconds and watch the file size. If it's increasing, keep waiting :)

You've also got the Verbose (-v) flag set, so it's dumping every single filename into the output buffer and your logfile. Remove the flag and let it run - you can list the tarfile after you're done to check it.

cocotu 11-15-2007 12:09 AM

yes acme the thing was running. if you look at my post above (crontab) it starts at 3AM. It was 12:10 when I posted so that means it was running for 9 hours. It finally finished by 2PM. thanks.. I will try rsync to see what happens and also acme.mail.order suggestion about removing the -v. thanks.

hayne 11-15-2007 12:14 AM

You might want to start off by measuring the basic speed of copying bytes over your network (between the source and destination drives). You could do this with my 'testFileCopy' script:
http://hayne.net/MacDev/Perl/testFileCopy

If you have a 100 mbps Ethernet network, my calculations show that it would take more than 6 hours to copy 240 GB.
So your 9 hours doesn't sound too bad.

cocotu 11-15-2007 11:30 AM

I'm NOT doing this over the network. This is done to an external firewire HD. thanks..

cwtnospam 11-15-2007 12:17 PM

It's just a guess, but I think you've got two minor problems that make each other worse than they should be.

First, you're trying to fit 240 GB in a 232 GB space. That in itself shouldn't slow things down, especially after compression, but next, you're running this script just before the nightly maintenance routines begin at 3:15 AM. I think maybe there's enough going on that it's making your backup larger than it should be, and more difficult to process.

How big is the final archived file?

cocotu 11-15-2007 12:38 PM

the final archived file is 180GB. So do you mean that I should pick another time for the cronjob? thanks

cwtnospam 11-15-2007 12:53 PM

It couldn't hurt. I know my system takes about 15 minutes to run them, but I'm using a little less than 100GB of my drive. They're disk intensive, so if they've got to share the read/write heads with other processes, they're bound to slow each other down. How much, I don't know.

cocotu 11-16-2007 12:44 PM

thanks cwtnospam, I set the cronjob to start at 1AM instead of 3AM. It just finished at 12:30PM. Not too much of a difference. So, in conclusion this means that tar is supposed to take many hours(9 hour approx.) to back up a 240GB folder? thanks all...

cwtnospam 11-16-2007 12:56 PM

Quote:

Originally Posted by cocotu (Post 425721)
thanks cwtnospam, I set the cronjob to start at 1AM instead of 3AM. It just finished at 12:30PM. Not too much of a difference. So, in conclusion this means that tar is supposed to take many hours(9 hour approx.) to back up a 240GB folder? thanks all...

Let's assume that it should take 6 hours instead of nearly 12. If you start it at 1:00 instead of 3:00, it's only about 1/3 of the way done when the maintenance utilities start. That potentially means that they could be slowing each other down for 2/3s of the cronjob. I don't know if it is the cause of the problem, but maybe if you could start it in the morning around 4:00 a.m., it might finish well before 3:00 pm. That would give you a better idea. Before picking a time to start it, you might want to run this in the terminal (from an administrator account - it's what gets run at 3:15) and time it:

sudo periodic daily weekly monthly

Add the time this takes to 3:15 and start the cronjob after that.

blb 11-16-2007 02:53 PM

The daily should be pretty quick, it's the weekly that'll take some time; weekly will rebuild the locate database, and hence scan over the entire filesystem.

As far as why this takes so long, have you watched top while it's running, maybe the compression is slowing it down more than the IO? If that isn't it, watch the IO numbers
Code:

iostat 2
(to see it refresh every two seconds) and see if one drive is performing slower than expected.

fracai 11-16-2007 03:02 PM

Quote:

Originally Posted by tlarkin (Post 425041)
there is an archive option with rysnc, the -a option. So you could do something like this

Code:

rysnc -r -u -a /path/to/directory /path/to/network/backup

"-a" has nothing to do with compression. The archive option is a substitute for "-rlptgoD". In other words it preserves links, permissions, time, owner and group IDs, devices, and is recursive. Note that "-r" is therefore redundant. The "-u" simply tells rsync to update files instead of overwriting newer files at the destination.

The only compression present for rsync is during transfer to speed up the transmission over a network.


My thought is that the reason this backup takes so long is simply because it's a lot of data and probably a large amount of poorly compressible data (audio or video). Though if the resulting file fits in less than half the size you're getting something decent in compression.

Try compressing and writing to /dev/null. That should test the raw speed of the compression and might indicate that the issue could be in the transfer to the external drive.

time tar czf - /Volumes/DataHD/projects > /dev/null

I also omit the verbose printing as extra printing can add to the slow down (do you need every file to be printed to the log?).

Do you have a reason to expect that compressing 240GB should take less than 12 hours?
A quick test (compressing 300MB of random data) predicts that 240GB could take approximately 11.5 hours. Add in the writing to the log and the overhead of actually writing the data and I don't really see why 240GB should be any faster. Compressing 300MB of zeros was 5 times faster so of course the calculation will depend on how compressible the data is, but I wouldn't expect anything less than 8 hours without writing to disk.

cocotu 11-20-2007 04:47 PM

thanks fracai. there is a lot of audio and video that may be another reason for it to take so long. is there a way to compress data using rsync? such as tar can compress to tar.gz? thanks

fracai 11-21-2007 08:56 AM

Nope, rsync is for transferring data. The best you could do is to compress then transfer. If there is a lot of audio and video you could set a hierarchy that allows you to compress the non-media data (text files, etc) and then copy the compressed data and the media (audio / video).

On another note, drives are cheap. You can get an external 500GB (pre-assembled) for $150 or even less. You can get them cheaper if you're willing to buy the drive and enclosure separately.

Especially if you use something like TimeMachine (or even rsync to do incremental backups) the time will be nowhere near 11 hours to do a full backup. I bring up incremental backups because TimeMachine and rsync will examine the files at the destination and only transfer changed files.

cocotu 11-21-2007 11:15 AM

a month ago I was trying iBackup which allowed me to zip all folders inside the directory I'm backing up. It also took a while, but I think is normal after reading all your posts. ibackup uses cpio. there must be a way of doing the same at the command line. thanks for your help

fracai 11-21-2007 01:02 PM

I assume you mean that iBackup would zip the folder and output it to your destination. This is pretty much what tar with gzip does. cpio is just another compression / file copy program. I believe it has included support for resource forks for quite a while, which is why iBackup uses it. I don't believe rsync has a method of reading data from standard input, or at least I haven't found a way, so any compressed data must be compressed outside of the rsync run.

If your goal is to backup media and compressible files to a disk that is not large enough to hold the uncompressed files and in a time frame that is reasonable you need to either back up to a larger destination or compress only the data that is compressible and hope that it's still small enough to fit.

I'd solve the second option by splitting the media and compressible data into separate folder structures. You can then compress the compressible data with a destination of your backup drive and copy the media in separate steps.

I suppose you could iterate over all the files in the source and selectively compress and output or just copy as appropriate (based on file location or extension).

cocotu 11-21-2007 02:50 PM

let me illustrate better:

We have a big directory(240GB) called proj. Within /proj there are many directories around 20.

/proj
/dir1
/dir2
/dir3
and so on.....

When I use tar it gives me a single proj.tar.gz(180GB) file at the destination external HD.

When I use iBackup I get:

/dir1.zip
/dir2.zip
/dir3.zip
and so on......

This makes it easier to retrieve the information.
Thanks fracai!!

baf 11-21-2007 03:07 PM

One thing I don't think you have mentioned. Is this a dual core/processor machine ?
If so it could (no guarantee ) be faster running it as several jobs. If so how many cores ?

fracai 11-21-2007 03:16 PM

Code:

#!/bin/sh

for DIR in proj/*/
do
    echo "${DIR}"
    BASE=$(basename "${DIR}")
    zip -r9 backup/proj/"${BASE}".zip "${DIR}"
done

That will compress each directory found in "proj/" and put the compressed files at "backup/proj/"

The way the script is written now you'd have to execute the script from within the directory above "proj". You should probably modify the script to use absolute paths for both the source and destination values.

cocotu 11-26-2007 03:33 PM

i wasn't able to find how many cores. This is the info. I have:

Mac OS X Server 10.4.6
2Ghz Power PC XServe G5
2 GB DDR SDRAM
machine model:RackMac3.1
machine name: XserveG5
CPU type: PowerPC G5 (3.0)
Number of CPUs: 1
CPU Speed: 2Ghz
L2 Cache (per CPU): 512KB
Memory: 2GB
Bus Speed: 1GHz
Boot ROM Version: 5.17f2

I have another concern. I know I'm able to view the contents of the tar.gz file, but this takes forever. Is there an application like Winrar that would enable me to view the contents of this tar.gz file? I just want to verify the folders inside the proj/.
Thanks..

tlarkin 11-26-2007 04:03 PM

Quote:

Originally Posted by fracai (Post 425790)
"-a" has nothing to do with compression. The archive option is a substitute for "-rlptgoD". In other words it preserves links, permissions, time, owner and group IDs, devices, and is recursive. Note that "-r" is therefore redundant. The "-u" simply tells rsync to update files instead of overwriting newer files at the destination.

Hmm, okay thanks for clearing that up. I was under the impression that -E did all that by preserving all the extended attributes... and you are right.

There is also an option for -z which does in fact say compress data. I am not sure how it compresses data though. You may want to look into it

baf 11-26-2007 04:08 PM

To find core:s info try (in terminal)

Code:

system_profiler SPHardwareDataType
And you'll get back sothing like:

Hardware:

Hardware Overview:

Model Name: MacBook
Model Identifier: MacBook2,1
Processor Name: Intel Core 2 Duo
Processor Speed: 2.16 GHz
Number Of Processors: 1
Total Number Of Cores: 2
L2 Cache (per processor): 4 MB
Memory: 2 GB
Bus Speed: 667 MHz
Boot ROM Version: MB21.00A5.B07
SMC Version: 1.17f0
Serial Number: W87222FBYA8
Sudden Motion Sensor:
State: Enabled


And if it doesnt say anything about cores then you have a single core system.

acme.mail.order 11-26-2007 07:44 PM

Quote:

Originally Posted by tlarkin (Post 428763)
There is also an option for -z which does in fact say compress data. I am not sure how it compresses data though.

tar -czf tarball.tgz files is basically the same as tar -cf - files | gzip > tarball.tar.gz

Quote:

Originally Posted by cocotu (Post 428755)

I have another concern. I know I'm able to view the contents of the tar.gz file, but this takes forever. Is there an application like Winrar that would enable me to view the contents of this tar.gz file? I just want to verify the folders inside the proj/.
Thanks..

This is the monster 200Gb tarfile? tar stores it's index throughout the file, it's going to take a while to scan through that much data, uncompressing internally as it goes. Be happy you're not dumping directly to tape, the original use for tar (Tape ARchiver)

cocotu 11-28-2007 11:50 AM

I ran the command:
system_profiler SPHardwareDataType
and got:

Hardware Overview:

Machine Name: Xserve G5
Machine Model: RackMac3,1
CPU Type: PowerPC G5 (3.0)
Number Of CPUs: 1
CPU Speed: 2 GHz
L2 Cache (per CPU): 512 KB
Memory: 2 GB
Bus Speed: 1 GHz
Boot ROM Version: 5.1.7f2
Serial Number: QP53901KSLX

So it must be a single core system. acme.mail.order I think tlarkin is refering to -z with rsync NOT tar. thanks for all the help!

fracai 11-28-2007 02:43 PM

Quote:

Originally Posted by tlarkin (Post 428763)
Hmm, okay thanks for clearing that up. I was under the impression that -E did all that by preserving all the extended attributes... and you are right.

There is also an option for -z which does in fact say compress data. I am not sure how it compresses data though. You may want to look into it

Quote:

Originally Posted by man rsync
-z, --compress compress file data during the transfer

Note the item that specifies "during transfer". rsync will compress the data while transferring the data and uncompress it at the destination. This is useful if your computer and the destination computer are fast, your data is easily compressible, and the network is slow. You'll get an overall speed boost by pre-compressing the data, transferring the smaller amount of data, and decompressing on the other side.

rsync will not create a compressed archive for you, however nice that feature would be.

cocotu 11-29-2007 03:34 PM

can we pipe it to tar?

rsync <whateverfile> | tar <whateverfile>

I don't think it would work because the destination HD is smaller than the source. It would be great if rsync could do such thing! Can cpio do this? iBackup uses cpio and it zips all the directories as I metioned before. thanks!

fracai 11-30-2007 12:40 PM

rsync doesn't pipe its output so this wouldn't work.

cpio, zip, or even tar can pipe their output, but the main benefit of rsync in making backups smaller is the ability to reference a backup that is already in place and only backup changed data (ie. the backup is incremental).

the bottom line is:
compressing large media files is slow and not typically not very beneficial.
the only way to fit a large backup in a small space is compression.
incremental backups are faster than full backups.
large drives are cheap.

The comment about the number of cores you have is correct, running parallel jobs could be faster, but with the media files you're compressing I'm not sure how much benefit you'd see. You might even just be limited by disk speed at that point. You can check your processor strain by opening "Activity Monitor" while running your backup script. If you have more than 1 core or CPU you'll see more than 1 graph charting your CPU usage. If your usage is maxed out already, parallel jobs won't provide a benefit. If each chart is around 50% you could look into running parallel jobs in your script.


All times are GMT -5. The time now is 05:43 PM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
Site design © IDG Consumer & SMB; individuals retain copyright of their postings
but consent to the possible use of their material in other areas of IDG Consumer & SMB.