The macosxhints Forums

The macosxhints Forums (http://hintsforums.macworld.com/index.php)
-   The Coat Room (http://hintsforums.macworld.com/forumdisplay.php?f=8)
-   -   Why isn't everything always zipped? (http://hintsforums.macworld.com/showthread.php?t=76688)

Twelve Motion 08-13-2007 03:26 AM

Why isn't everything always zipped?
 
I was doing some cleaning on my HD, and I found a folder called "School Stuff" in it was about 125mb of .doc, .ppt, .pdf, and a bunch of other stuff from my schoolwork as far back as 2000. I was going to just toss it. But then I figured what if I wanted something someday? So I decided to just archive it and keep it. So I did.

It went from 125mb, to 250kb. I was shocked and amazed how much smaller that got! I tried unzipping it, and it took like 2 seconds.

So here is my question/suggestion. Why isn't everything on my computer zipped when not in use? I was thinking how great it would be if everything was automatically zipped after say... 7 days of not being used. It would save SO MUCH HD SPACE! If my machine would auto zip stuff like Toast Titanuim, and Adobe Acrobate/Premier/After Effect and even iphoto it would save gigabytes of HD space. Sometimes I go months without using these programs. And the one time I use them, I can just unzip them. Of course I realize I can do it all manually. But it would be sweet if it was automated.

Is what I am saying totally crazy?

baf 08-13-2007 03:37 AM

Well first of all you don't save that much space usually. Wordocuments with drawn pictures are notorious spacewasters. Normal savings is about 50% space. Second a zipped file is much more endangered by any diskerror one sector missing and poof. And third in some filesystems ntfs e.g. you can have compressed files that you don't have to zip/unzip its completely automatic. And forth programs is amongst the hardest to save space on usually savings are only 10-15% or such.

Twelve Motion 08-13-2007 05:28 AM

Hmm well I suppose being at risk of disk errors is a serious concern. But if you back up regularly this shouldn't be much of an issue. Even if some applications don't save much space, I bet if you zip your entire harddrive (for examples sake) you would probably save 15-25% space. I would say that would be worth it the computer did it for me on it's own.

acme.mail.order 08-13-2007 05:41 AM

Figuring out what to compress and what not to compress would be a major headache. Log files are typically compressed on a regular basis, but compressing pictures, video or music (today's BIG spacewasters) doesn't save you much space, and some compression routines will make them bigger!! Try zipping your Music, Movies and Pictures folders and see if it takes like 2 seconds.

Historically (i.e. before your time :D ) there were programs like Disk Doubler that did compress the entire drive, but with a noticeable performance penalty. I used to use one such program on a W95 machine with a 500Mb drive. As disk space has increased faster than computer performance the idea of compression to save space has largely disappeared. Why go to that much trouble when 250Gb costs less than a day's skiing (or these days, a tank of gas :rolleyes: )?

schneb 08-13-2007 01:47 PM

If a valuable file becomes corrupted, a zipped file would be useless. An original file could quite possibly be salvaged.

gsparks 08-13-2007 02:26 PM

Quote:

Originally Posted by acme.mail.order (Post 400658)
Historically (i.e. before your time :D ) there were programs like Disk Doubler that did compress the entire drive, but with a noticeable performance penalty.

Oh man, have I just dated myself... I used to use both Disk Doubler and RAM Doubler in the good ol' days!!! Talk about a name from the past...

Twelve Motion 08-13-2007 02:31 PM

I just tried zipping an mp3. It went from 4mb to 4mb.

Alright, my delusions are over.

styrafome 08-13-2007 03:15 PM

Yup, that's because an mp3 is already compressed. There is no more space to be gained. It is also true what was said earlier: Some compressed formats become larger if you apply additional, different compression to them.

When hard drives were small, there were utilities that used to do what you wanted; they would compress everything in the background, like Stacker and TimesTwo. Due to the issues everyone is listing, the users turned on these utilities and eventually banished them from their machines. Microsoft built your feature request into Windows, but again, after a while nobody wanted the complications, and as a final blow, Microsoft removed the support for it from later Windows, leaving some users with data that could not be opened in later Windows.

Twelve Motion 08-13-2007 04:15 PM

How does compression work? I suppose I could just google it, but first I want to theorize. Assuming everything on a computer is binary (it is right?) a file could look like this:

11000010010000101110

A compressions program could group all like digits, and remember how many of each there is like:

120410210410130

So this is saying 1x2, 0x4, 1, 0x2 and etc. If you apply this kind of stuff to an entire file it would save quite a few bytes wouldn't it?

Is that how it works?

ThreeDee 08-13-2007 05:03 PM

Compression algorithms are probably way over our heads. A simple example could be like this (i think):

Quote:

abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg abcdefg
Being compressed, it would possibly look like this, although not in english:
Quote:

write "abcdefg" 30 times
Of course, the algorithm would have to take into account some very complex data, not just repeating text.

wdympcf 08-13-2007 05:29 PM

Check out the following wikipedia article:

http://en.wikipedia.org/wiki/Data_compression#Theory

Twelve_Motion, notice that the worst offender for unnecessary waste of space is .doc files (they have the best compression ratio when "zipped"). I am guessing that your "School Stuff" directory was loaded with .doc files! It's interesting to note that you got a significantly better compression ratio than any of the utilites listed on this page though. I'm curious about that!

styrafome 08-13-2007 05:30 PM

Compression is very complex. For JPEGs, for instance, the algorithm exploits the way the human eye sees. One thing it does is save space by draining out colors that are not supposed to be human-perceptible. If you weren't a human, JPEG would look really wrong. Same with audio. MP3 and others don't save frequencies that humans are not supposed to hear. Of course, audiophiles say then can, so they're not satisfied.

Side effect, and why it's hard to do global compression: If you compress, you better use an algorithm matched/optimized to the content specifically. ZIP doesn't work for everything.

One last thing. Many of us don't realize that much of what's already on our disks is already compressed. MiniDV and ripped DVDs are highly compressed. MP3 and AAC are highly compressed. JPEG is highly compressed. Your OS X log file archives are compressed by OS X. So you're not going to get much more compression of those files without dropping quality.

wdympcf 08-13-2007 05:43 PM

styrafome's comments might lead to a little bit of confusion for the uninitiated in compression. JPEG, DV, MP3, and AAC are all lossy compression formats. Like styrafome said, they take advantage of human perception (audio, visual) and eliminate data that is at the threshold of perception. In the case of MP3, frequencies at the edge of human perception or in certain ranges that we are not particularly susceptible to are reduced or eliminated.

However, file compression formats such as ZIP are lossless. They can't be anything else. If data were lost, then your documents would contain spelling errors at the very least and likely wouldn't even be readable by the application that created them.

Comparing lossy and lossless algorithms in this regard is pointless, as they utilize very different logic and mechanisms.

Twelve Motion 08-13-2007 07:21 PM

Quote:

Originally Posted by wdympcf (Post 400791)
Check out the following wikipedia article:

http://en.wikipedia.org/wiki/Data_compression#Theory

Twelve_Motion, notice that the worst offender for unnecessary waste of space is .doc files (they have the best compression ratio when "zipped"). I am guessing that your "School Stuff" directory was loaded with .doc files! It's interesting to note that you got a significantly better compression ratio than any of the utilites listed on this page though. I'm curious about that!

Thanks for that link I will check it out. it's true most of my school stuff was MS Word .doc files. Many of them had "Visual Aids" in the papers. So they had jpegs in the pages of the .doc file. I bet that is why the folder seemed kinda big, but the compression was very small. I bet MS word ads images to documents rather clumsily.

ThreeDee 08-13-2007 07:43 PM

Quote:

125mb, to 250kb
Is that even possible? Lossless file compression with a ratio of 512:1?

ThreeDee 08-13-2007 08:08 PM

Just as a test, I copied and pasted the contents of this whole page into Text Edit 100 times, saved it as a text file. The file size was exactly 1,149,000 bytes, or about 1.1 MB according to the Finder. I did a simple Create Archive of the file in the Finder. The size of the zip was only 13435 bytes, or about 16 KB, about 1% of the original file size.

I noticed that the zip file may be somewhat like how I described. In the zip file, opened in text edit, there was a ton of gibberish folloed by this line repeating over and over:
Quote:

g¬ôp&ú»·H¬ôp&ú g¬ôp&ú g¬ôp&ú g¬ôp&ú g¬ôp&ú g¬ôp&ú g¬ôp&ú g¬ôp&ú g¬P!ú g¬ôp&ú
Doing a Find and Replace, I found there were 98 occurrences of the same string, which probably means the algorithm found that the same text was repeating over and over, and used the string as a kind of placeholder for the real text.

Twelve Motion 08-14-2007 04:40 AM

Hey awesome research!

cwtnospam 08-14-2007 09:08 AM

Quote:

Originally Posted by ThreeDee (Post 400828)
Is that even possible? Lossless file compression with a ratio of 512:1?

It depends on the data held in the file, and the file format. Widely varying degrees of compression when zipping different file formats (ie, text file vs mp3) indicates that some file types like mp3 are already compressed while others aren't. I think the real question is why some aren't. My guess is that early formats (that remain with us) were developed before processing power allowed for compression/decompression on the fly, so they tended to more closely match what the file would look like in RAM where it would be editable.

acme.mail.order 08-14-2007 10:07 AM

512:1 is stretching things a bit. A highly repetitive (but possible) logfile compressed to 170:1, a VERY artificial file consisting of the line "some data" repeated 200,000 times compressed to 500:1. So it's possible but rather unlikely. Maybe TwelveMotion mis-read a decimal point?

hayne 08-14-2007 04:04 PM

Quote:

Originally Posted by acme.mail.order (Post 400952)
a VERY artificial file consisting of the line "some data" repeated 200,000 times compressed to 500:1

For those wanting to repeat acme.mail.order's experiments at home:
Code:

% jot -b "some data" 200000 > foo.txt

% ls -l foo*
-rw-r--r--  1 fred  fred  2000000 Aug 14 15:57 foo.txt

% gzip foo.txt

% ls -l foo*
-rw-r--r--  1 fred  fred  3932 Aug 14 15:57 foo.txt.gz

% echo "2000000/3932" | bc 
508

But note that further compression is possible if a different algorithm is used:
Code:

% compress foo.txt.gz

% ls -l foo*
-rw-r--r--  1 fred  fred  168 Aug 14 15:57 foo.txt.gz.Z

% echo "2000000/168" | bc 
11904


acme.mail.order 08-14-2007 08:11 PM

Logic error there hayne, you didn't use a different algorithm on the original, you double-compressed the file (which normally has little effect).


All times are GMT -5. The time now is 01:14 AM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
Site design © IDG Consumer & SMB; individuals retain copyright of their postings
but consent to the possible use of their material in other areas of IDG Consumer & SMB.