![]() |
Why isn't everything always zipped?
I was doing some cleaning on my HD, and I found a folder called "School Stuff" in it was about 125mb of .doc, .ppt, .pdf, and a bunch of other stuff from my schoolwork as far back as 2000. I was going to just toss it. But then I figured what if I wanted something someday? So I decided to just archive it and keep it. So I did.
It went from 125mb, to 250kb. I was shocked and amazed how much smaller that got! I tried unzipping it, and it took like 2 seconds. So here is my question/suggestion. Why isn't everything on my computer zipped when not in use? I was thinking how great it would be if everything was automatically zipped after say... 7 days of not being used. It would save SO MUCH HD SPACE! If my machine would auto zip stuff like Toast Titanuim, and Adobe Acrobate/Premier/After Effect and even iphoto it would save gigabytes of HD space. Sometimes I go months without using these programs. And the one time I use them, I can just unzip them. Of course I realize I can do it all manually. But it would be sweet if it was automated. Is what I am saying totally crazy? |
Well first of all you don't save that much space usually. Wordocuments with drawn pictures are notorious spacewasters. Normal savings is about 50% space. Second a zipped file is much more endangered by any diskerror one sector missing and poof. And third in some filesystems ntfs e.g. you can have compressed files that you don't have to zip/unzip its completely automatic. And forth programs is amongst the hardest to save space on usually savings are only 10-15% or such.
|
Hmm well I suppose being at risk of disk errors is a serious concern. But if you back up regularly this shouldn't be much of an issue. Even if some applications don't save much space, I bet if you zip your entire harddrive (for examples sake) you would probably save 15-25% space. I would say that would be worth it the computer did it for me on it's own.
|
Figuring out what to compress and what not to compress would be a major headache. Log files are typically compressed on a regular basis, but compressing pictures, video or music (today's BIG spacewasters) doesn't save you much space, and some compression routines will make them bigger!! Try zipping your Music, Movies and Pictures folders and see if it takes like 2 seconds.
Historically (i.e. before your time :D ) there were programs like Disk Doubler that did compress the entire drive, but with a noticeable performance penalty. I used to use one such program on a W95 machine with a 500Mb drive. As disk space has increased faster than computer performance the idea of compression to save space has largely disappeared. Why go to that much trouble when 250Gb costs less than a day's skiing (or these days, a tank of gas :rolleyes: )? |
If a valuable file becomes corrupted, a zipped file would be useless. An original file could quite possibly be salvaged.
|
Quote:
|
I just tried zipping an mp3. It went from 4mb to 4mb.
Alright, my delusions are over. |
Yup, that's because an mp3 is already compressed. There is no more space to be gained. It is also true what was said earlier: Some compressed formats become larger if you apply additional, different compression to them.
When hard drives were small, there were utilities that used to do what you wanted; they would compress everything in the background, like Stacker and TimesTwo. Due to the issues everyone is listing, the users turned on these utilities and eventually banished them from their machines. Microsoft built your feature request into Windows, but again, after a while nobody wanted the complications, and as a final blow, Microsoft removed the support for it from later Windows, leaving some users with data that could not be opened in later Windows. |
How does compression work? I suppose I could just google it, but first I want to theorize. Assuming everything on a computer is binary (it is right?) a file could look like this:
11000010010000101110 A compressions program could group all like digits, and remember how many of each there is like: 120410210410130 So this is saying 1x2, 0x4, 1, 0x2 and etc. If you apply this kind of stuff to an entire file it would save quite a few bytes wouldn't it? Is that how it works? |
Compression algorithms are probably way over our heads. A simple example could be like this (i think):
Quote:
Quote:
|
Check out the following wikipedia article:
http://en.wikipedia.org/wiki/Data_compression#Theory Twelve_Motion, notice that the worst offender for unnecessary waste of space is .doc files (they have the best compression ratio when "zipped"). I am guessing that your "School Stuff" directory was loaded with .doc files! It's interesting to note that you got a significantly better compression ratio than any of the utilites listed on this page though. I'm curious about that! |
Compression is very complex. For JPEGs, for instance, the algorithm exploits the way the human eye sees. One thing it does is save space by draining out colors that are not supposed to be human-perceptible. If you weren't a human, JPEG would look really wrong. Same with audio. MP3 and others don't save frequencies that humans are not supposed to hear. Of course, audiophiles say then can, so they're not satisfied.
Side effect, and why it's hard to do global compression: If you compress, you better use an algorithm matched/optimized to the content specifically. ZIP doesn't work for everything. One last thing. Many of us don't realize that much of what's already on our disks is already compressed. MiniDV and ripped DVDs are highly compressed. MP3 and AAC are highly compressed. JPEG is highly compressed. Your OS X log file archives are compressed by OS X. So you're not going to get much more compression of those files without dropping quality. |
styrafome's comments might lead to a little bit of confusion for the uninitiated in compression. JPEG, DV, MP3, and AAC are all lossy compression formats. Like styrafome said, they take advantage of human perception (audio, visual) and eliminate data that is at the threshold of perception. In the case of MP3, frequencies at the edge of human perception or in certain ranges that we are not particularly susceptible to are reduced or eliminated.
However, file compression formats such as ZIP are lossless. They can't be anything else. If data were lost, then your documents would contain spelling errors at the very least and likely wouldn't even be readable by the application that created them. Comparing lossy and lossless algorithms in this regard is pointless, as they utilize very different logic and mechanisms. |
Quote:
|
Quote:
|
Just as a test, I copied and pasted the contents of this whole page into Text Edit 100 times, saved it as a text file. The file size was exactly 1,149,000 bytes, or about 1.1 MB according to the Finder. I did a simple Create Archive of the file in the Finder. The size of the zip was only 13435 bytes, or about 16 KB, about 1% of the original file size.
I noticed that the zip file may be somewhat like how I described. In the zip file, opened in text edit, there was a ton of gibberish folloed by this line repeating over and over: Quote:
|
Hey awesome research!
|
Quote:
|
512:1 is stretching things a bit. A highly repetitive (but possible) logfile compressed to 170:1, a VERY artificial file consisting of the line "some data" repeated 200,000 times compressed to 500:1. So it's possible but rather unlikely. Maybe TwelveMotion mis-read a decimal point?
|
Quote:
Code:
% jot -b "some data" 200000 > foo.txtCode:
% compress foo.txt.gz |
Logic error there hayne, you didn't use a different algorithm on the original, you double-compressed the file (which normally has little effect).
|
| All times are GMT -5. The time now is 01:14 AM. |
Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
Site design © IDG Consumer & SMB; individuals retain copyright of their postings
but consent to the possible use of their material in other areas of IDG Consumer & SMB.