![]() |
tar: backing up a 240GB takes too long
I'm backing up 240GB folder to an external HD with 232GB available space. I'm using this in my crontab:
00 03 * * * /usr/local/bin/tar cvzf /Volumes/backupdisk/projects.bak/projects.tar.gz /Volumes/DataHD/projects 2>&1 | tee /Users/bhadmin/projects.log I upgraded to the lastest tar. Its 12:10PM now and its still running. Is there anything i'm doing wrong? Is there a way to make it better? Or this is the way it should be? I saw a backup script here: http://www.faqs.org/docs/securing/chap29sec306.html but not sure if it would work for me. thanks.. |
Have you done this backup before? USB or Firewire connection? Drive specs?
Is it the tar taking forever, or the copying? Or are you just imagining it "should" be faster? |
ever think of using rsync?
|
can rsync compress things? if yes, what is the option for that? can you illustrate with a sample? thanks...
dzurn, to answer your questions: 1. Yes I have done this backup before. 2. Firewire connection. 3. LaCie external HD (250GB) if you need the model # let me know. 4. The copying is taking a long time NOT sure what do you mean "Is it the tar taking forever". 5. I wrote this post for the obvious reason: to find out if this behavior is normal or not. thanks. |
there is an archive option with rysnc, the -a option. So you could do something like this
Code:
rysnc -r -u -a /path/to/directory /path/to/network/backup |
How long is it taking? You say it's "12:10PM now and its still running" but you didn't mention the start time. I would expect it to take several hours but you probably want to know if it's frozen or not. Run `ls -l` command every few seconds and watch the file size. If it's increasing, keep waiting :)
You've also got the Verbose (-v) flag set, so it's dumping every single filename into the output buffer and your logfile. Remove the flag and let it run - you can list the tarfile after you're done to check it. |
yes acme the thing was running. if you look at my post above (crontab) it starts at 3AM. It was 12:10 when I posted so that means it was running for 9 hours. It finally finished by 2PM. thanks.. I will try rsync to see what happens and also acme.mail.order suggestion about removing the -v. thanks.
|
You might want to start off by measuring the basic speed of copying bytes over your network (between the source and destination drives). You could do this with my 'testFileCopy' script:
http://hayne.net/MacDev/Perl/testFileCopy If you have a 100 mbps Ethernet network, my calculations show that it would take more than 6 hours to copy 240 GB. So your 9 hours doesn't sound too bad. |
I'm NOT doing this over the network. This is done to an external firewire HD. thanks..
|
It's just a guess, but I think you've got two minor problems that make each other worse than they should be.
First, you're trying to fit 240 GB in a 232 GB space. That in itself shouldn't slow things down, especially after compression, but next, you're running this script just before the nightly maintenance routines begin at 3:15 AM. I think maybe there's enough going on that it's making your backup larger than it should be, and more difficult to process. How big is the final archived file? |
the final archived file is 180GB. So do you mean that I should pick another time for the cronjob? thanks
|
It couldn't hurt. I know my system takes about 15 minutes to run them, but I'm using a little less than 100GB of my drive. They're disk intensive, so if they've got to share the read/write heads with other processes, they're bound to slow each other down. How much, I don't know.
|
thanks cwtnospam, I set the cronjob to start at 1AM instead of 3AM. It just finished at 12:30PM. Not too much of a difference. So, in conclusion this means that tar is supposed to take many hours(9 hour approx.) to back up a 240GB folder? thanks all...
|
Quote:
sudo periodic daily weekly monthly Add the time this takes to 3:15 and start the cronjob after that. |
The daily should be pretty quick, it's the weekly that'll take some time; weekly will rebuild the locate database, and hence scan over the entire filesystem.
As far as why this takes so long, have you watched top while it's running, maybe the compression is slowing it down more than the IO? If that isn't it, watch the IO numbers Code:
iostat 2 |
Quote:
The only compression present for rsync is during transfer to speed up the transmission over a network. My thought is that the reason this backup takes so long is simply because it's a lot of data and probably a large amount of poorly compressible data (audio or video). Though if the resulting file fits in less than half the size you're getting something decent in compression. Try compressing and writing to /dev/null. That should test the raw speed of the compression and might indicate that the issue could be in the transfer to the external drive. time tar czf - /Volumes/DataHD/projects > /dev/null I also omit the verbose printing as extra printing can add to the slow down (do you need every file to be printed to the log?). Do you have a reason to expect that compressing 240GB should take less than 12 hours? A quick test (compressing 300MB of random data) predicts that 240GB could take approximately 11.5 hours. Add in the writing to the log and the overhead of actually writing the data and I don't really see why 240GB should be any faster. Compressing 300MB of zeros was 5 times faster so of course the calculation will depend on how compressible the data is, but I wouldn't expect anything less than 8 hours without writing to disk. |
thanks fracai. there is a lot of audio and video that may be another reason for it to take so long. is there a way to compress data using rsync? such as tar can compress to tar.gz? thanks
|
Nope, rsync is for transferring data. The best you could do is to compress then transfer. If there is a lot of audio and video you could set a hierarchy that allows you to compress the non-media data (text files, etc) and then copy the compressed data and the media (audio / video).
On another note, drives are cheap. You can get an external 500GB (pre-assembled) for $150 or even less. You can get them cheaper if you're willing to buy the drive and enclosure separately. Especially if you use something like TimeMachine (or even rsync to do incremental backups) the time will be nowhere near 11 hours to do a full backup. I bring up incremental backups because TimeMachine and rsync will examine the files at the destination and only transfer changed files. |
a month ago I was trying iBackup which allowed me to zip all folders inside the directory I'm backing up. It also took a while, but I think is normal after reading all your posts. ibackup uses cpio. there must be a way of doing the same at the command line. thanks for your help
|
I assume you mean that iBackup would zip the folder and output it to your destination. This is pretty much what tar with gzip does. cpio is just another compression / file copy program. I believe it has included support for resource forks for quite a while, which is why iBackup uses it. I don't believe rsync has a method of reading data from standard input, or at least I haven't found a way, so any compressed data must be compressed outside of the rsync run.
If your goal is to backup media and compressible files to a disk that is not large enough to hold the uncompressed files and in a time frame that is reasonable you need to either back up to a larger destination or compress only the data that is compressible and hope that it's still small enough to fit. I'd solve the second option by splitting the media and compressible data into separate folder structures. You can then compress the compressible data with a destination of your backup drive and copy the media in separate steps. I suppose you could iterate over all the files in the source and selectively compress and output or just copy as appropriate (based on file location or extension). |
let me illustrate better:
We have a big directory(240GB) called proj. Within /proj there are many directories around 20. /proj /dir1 /dir2 /dir3 and so on..... When I use tar it gives me a single proj.tar.gz(180GB) file at the destination external HD. When I use iBackup I get: /dir1.zip /dir2.zip /dir3.zip and so on...... This makes it easier to retrieve the information. Thanks fracai!! |
One thing I don't think you have mentioned. Is this a dual core/processor machine ?
If so it could (no guarantee ) be faster running it as several jobs. If so how many cores ? |
Code:
#!/bin/shThe way the script is written now you'd have to execute the script from within the directory above "proj". You should probably modify the script to use absolute paths for both the source and destination values. |
i wasn't able to find how many cores. This is the info. I have:
Mac OS X Server 10.4.6 2Ghz Power PC XServe G5 2 GB DDR SDRAM machine model:RackMac3.1 machine name: XserveG5 CPU type: PowerPC G5 (3.0) Number of CPUs: 1 CPU Speed: 2Ghz L2 Cache (per CPU): 512KB Memory: 2GB Bus Speed: 1GHz Boot ROM Version: 5.17f2 I have another concern. I know I'm able to view the contents of the tar.gz file, but this takes forever. Is there an application like Winrar that would enable me to view the contents of this tar.gz file? I just want to verify the folders inside the proj/. Thanks.. |
Quote:
There is also an option for -z which does in fact say compress data. I am not sure how it compresses data though. You may want to look into it |
To find core:s info try (in terminal)
Code:
system_profiler SPHardwareDataTypeHardware: Hardware Overview: Model Name: MacBook Model Identifier: MacBook2,1 Processor Name: Intel Core 2 Duo Processor Speed: 2.16 GHz Number Of Processors: 1 Total Number Of Cores: 2 L2 Cache (per processor): 4 MB Memory: 2 GB Bus Speed: 667 MHz Boot ROM Version: MB21.00A5.B07 SMC Version: 1.17f0 Serial Number: W87222FBYA8 Sudden Motion Sensor: State: Enabled And if it doesnt say anything about cores then you have a single core system. |
Quote:
Quote:
|
I ran the command:
system_profiler SPHardwareDataType and got: Hardware Overview: Machine Name: Xserve G5 Machine Model: RackMac3,1 CPU Type: PowerPC G5 (3.0) Number Of CPUs: 1 CPU Speed: 2 GHz L2 Cache (per CPU): 512 KB Memory: 2 GB Bus Speed: 1 GHz Boot ROM Version: 5.1.7f2 Serial Number: QP53901KSLX So it must be a single core system. acme.mail.order I think tlarkin is refering to -z with rsync NOT tar. thanks for all the help! |
Quote:
Quote:
rsync will not create a compressed archive for you, however nice that feature would be. |
can we pipe it to tar?
rsync <whateverfile> | tar <whateverfile> I don't think it would work because the destination HD is smaller than the source. It would be great if rsync could do such thing! Can cpio do this? iBackup uses cpio and it zips all the directories as I metioned before. thanks! |
rsync doesn't pipe its output so this wouldn't work.
cpio, zip, or even tar can pipe their output, but the main benefit of rsync in making backups smaller is the ability to reference a backup that is already in place and only backup changed data (ie. the backup is incremental). the bottom line is: compressing large media files is slow and not typically not very beneficial. the only way to fit a large backup in a small space is compression. incremental backups are faster than full backups. large drives are cheap. The comment about the number of cores you have is correct, running parallel jobs could be faster, but with the media files you're compressing I'm not sure how much benefit you'd see. You might even just be limited by disk speed at that point. You can check your processor strain by opening "Activity Monitor" while running your backup script. If you have more than 1 core or CPU you'll see more than 1 graph charting your CPU usage. If your usage is maxed out already, parallel jobs won't provide a benefit. If each chart is around 50% you could look into running parallel jobs in your script. |
| All times are GMT -5. The time now is 05:43 PM. |
Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
Site design © IDG Consumer & SMB; individuals retain copyright of their postings
but consent to the possible use of their material in other areas of IDG Consumer & SMB.