|
|
#1 |
|
Prospect
Join Date: Jan 2006
Posts: 4
|
I've been reading the posts about the ability to use Terminal and a downloadable script file to eliminate duplicate loops. This is so attractive as I have a PowerBook G4 and it's almost maxed out. I tried to click on the download here word, but another page comes up saying the site is suspended or such. Is there anyone that has the script and if so are you interested in sharing it. I'd love to capture back my HD space.
Thanks Technorev |
|
|
|
|
|
#2 |
|
Site Admin
Join Date: Jan 2002
Location: Montreal
Posts: 32,473
|
Please tell us what thread or article you are referring to.
__________________
hayne.net/macosx.html |
|
|
|
|
|
#3 |
|
Prospect
Join Date: Jan 2006
Posts: 4
|
Here is the link
Thank you for asking. I'm new to this forum, but I think it wasn't a thread, but another part of this site. Please find here the address http://www.macosxhints.com/article.p...40126045617781.
I hope you can help me. Thanks Technorev Last edited by hayne; 01-15-2006 at 11:07 PM. Reason: fixed link |
|
|
|
|
|
#4 | |||||||||||||||||||||||
|
Site Admin
Join Date: Jan 2002
Location: Montreal
Posts: 32,473
|
I see the problem. That article (on the main macosxhints site) links to a script on an external site, but that external site seems to no longer be operating. There is nothing we can do about that. Maybe someone who has saved a copy of the script will see this thread and respond. [edit] I have sent an email to the author of that article (Xeo) asking for an updated link for the script. Check back in a while to see if Xeo has added a comment to that article supplying a new link. [/edit]
__________________
hayne.net/macosx.html Last edited by hayne; 01-18-2006 at 03:59 AM. |
|||||||||||||||||||||||
|
|
|
|
|
#5 |
|
All Star
Join Date: Aug 2004
Posts: 759
|
Actually, I just installed iLife '06 and the World Music Jam Pack and have a ton of duplicate loops I wish I could get rid of. Any ideas? The above hint isn't really relevant. Prior to installing iLife '06 I had iLife '04 installed, and also the four downloable Jam Packs from .Mac.
|
|
|
|
|
|
#6 |
|
Site Admin
Join Date: Jan 2002
Location: Montreal
Posts: 32,473
|
The author of the "hint" article referred to above has replied and that article has been updated to link to a copy of the script on the macosxhints server.
So have a try. But note (as I think was mentioned in the article) that the script just removes files by name - the ones that the author noticed were dupes - it doesn't search for duplicates or verify that the ones it is removing are duplicates. So be sure to have a backup of your loops before running the script in case it removes something that you only have one copy of.
__________________
hayne.net/macosx.html |
|
|
|
|
|
#7 |
|
Site Admin
Join Date: Jan 2002
Location: Montreal
Posts: 32,473
|
The script referred to in the above "hint" uses hard-coded lists of duplicate files. The original author (Xeo) somehow figured out which files were duplicates and made lists of them, which are then incorporated into the script.
But that means that it doesn't help you if you have other loops that are not among those cataloged by Xeo. As a first step towards solving the general problem of removing duplicate loops, I wrote a script that will search for duplicates and list them automatically. I.e. it should be able to reproduce Xeo's lists. I supply the script below. To use it you would need to do the usual things for running a script - see this Unix FAQ. If the script file is called "findDupeFiles" and it is in your current folder then you could run it on the two folders "/Documents/Apple Loops for Soundtrack" and "/Library/Application Support/GarageBand" as follows: Code:
./findDupeFiles '.aif|.aiff' "/Documents/Apple Loops for Soundtrack" "/Library/Application Support/GarageBand" You need to have the folder paths (the other two arguments) in quotes because the paths contain spaces. Note that this command will typically take several minutes to finish and you won't see any output until just before the end. The output (in the Terminal window) from the above command would list all the duplicates it found in those folders (and sub-folders). Each set of duplicates is separated from the next in the output by a line like this: ----------------------- The script is actually very general so it could be used to search for duplicates of any type of file - .e.g. MP3 files What it does is compare the files based on the "MD5 digest" which is a sequence of characters that is computed from the content of the file. It is possible but extremely unlikely that two files with different content would have the same "MD5 digest". The file comparison does not look at the file names at all - just the content of the files. Code:
#!/usr/bin/perl
use strict;
use warnings;
# findDupeFiles:
# This script attempts to identify which files might be duplicates.
# It searches specified directories for files with a given suffix
# and reports on files that have the same MD5 digest.
# The suffix or suffixes to be searched for are specified by the first
# command-line argument - each suffix separated from the next by a vertical bar.
# The subsequent command-line arguments specify the directories to be searched.
# If no directories are specified on the command-line,
# it searches the current directory.
# Files whose names start with "._" are ignored.
#
# Cameron Hayne (macdev@hayne.net) January 2006 (revised March 2006)
#
#
# Examples of use:
# ----------------
# findDupeFiles '.aif|.aiff' AAA BBB CCC
# would look for duplicates among all the files with ".aif" or ".aiff" suffixes
# under the directories AAA, BBB, and CCC
#
# findDupeFiles '.aif|.aiff'
# would look for duplicates among all the files with ".aif" or ".aiff" suffixes
# under the current directory
#
# findDupeFiles '' AAA BBB CCC
# would look for duplicates among all the files (no matter what suffix)
# under the directories AAA, BBB, and CCC
#
# findDupeFiles
# would look for duplicates among all the files (no matter what suffix)
# under the current directory
# -----------------------------------------------------------------------------
use File::Find;
use File::stat;
use Digest::MD5;
use Fcntl;
# The HFS+ filesystem used on OS X has resource forks as well as data forks
# By default this script checks the resource forks of files with duplicate data
# and issues a message if the resource forks are different.
# If you don't want to do this (e.g. on some other Unix system)
# then set the 'checkRsrc' variable to 0
my $checkRsrc = 1; # whether to check the resource forks
my $matchSomeSuffix; # reference to a subroutine for matching suffixes
if (defined($ARGV[0]))
{
# the list of desired suffixes is supplied in $ARGV[0]
# separated by vertical bars - e.g. ".mp3|.aiff"
# Note that if $ARGV[0] is '', then all files will be looked at
my @suffixes = split(/\|/, $ARGV[0]);
if (scalar(@suffixes) > 0)
{
# create an efficient matching subroutine using the Friedl technique
my $matchExpr = join('||', map {"m/\$suffixes[$_]\$/io"} 0..$#suffixes);
$matchSomeSuffix = eval "sub {$matchExpr}";
}
shift @ARGV;
}
# if no dirs supplied as command-line args, we search the current directory
my @searchDirs = @ARGV ? @ARGV : ".";
# verify that these are in fact directories
foreach my $dir (@searchDirs)
{
die "\"$dir\" is not a directory\n" unless -d "$dir";
}
my %filesByDataLength; # global variable holding hash of arrays of fileInfo's
# calcMd5: returns the MD5 digest of the given file
sub calcMd5($)
{
my ($filename) = @_;
if (-d $filename)
{
# doing MD5 on a directory is not supported
return "unsupported"; # we need to return something
}
# We use 'sysopen' instead of just 'open' in order to be able to handle
# filenames with leading whitespace or leading "-"
# The usual trick to protect against leading whitespace or "-" is to do
# $filename =~ s#^(\s)#./$1#; open(FILE, "< $filename\0")
# but that fails if the filename is something like "- foo"
# (i.e. if there is an initial "-" followed by whitespace)
sysopen(FILE, $filename, O_RDONLY)
or die "Unable to open file \"$filename\": $!\n";
binmode(FILE); # just in case we're on Windows!
my $md5 = Digest::MD5->new->addfile(*FILE)->hexdigest;
close(FILE);
return $md5;
}
# hashByMd5: passed a ref to an array of fileInfo's
# Returns a ref to a hash by md5 of the fileInfo's
sub hashByMd5($)
{
my ($fileInfoListRef) = @_;
my %filesByMd5;
foreach my $fileInfo (@{$fileInfoListRef})
{
my $dirname = $fileInfo->{dirname};
my $filename = $fileInfo->{filename};
my $md5 = calcMd5("$dirname/$filename");
push(@{$filesByMd5{$md5}}, $fileInfo);
}
return \%filesByMd5;
}
# checkFile: invoked from the 'find' routine on each file or directory in turn
sub checkFile()
{
return unless -f $_; # only interested in files, not directories
my $filename = $_;
my $dirname = $File::Find::dir;
return if $filename =~ /^\._/; # ignore files whose names start with "._"
if (defined($matchSomeSuffix))
{
return unless &$matchSomeSuffix;
}
my $statInfo = stat($filename)
or warn "Can't stat file \"$dirname/$filename\": $!\n" and return;
my $size = $statInfo->size;
my $fileInfo = {
'dirname' => $dirname,
'filename' => $filename,
};
push(@{$filesByDataLength{$size}}, $fileInfo);
}
MAIN:
{
# traverse the directories and collate the files by data length
# in the global variable %filesByDataLength
find(\&checkFile, @searchDirs);
my $numDupes = 0;
my $numDupeBytes = 0;
# process the files by size, starting with the largest
foreach my $size (sort {$b<=>$a} keys %filesByDataLength)
{
my $numSameSize = scalar(@{$filesByDataLength{$size}});
next unless $numSameSize > 1;
#print "size: $size numSameSize: $numSameSize\n";
my $filesByMd5Ref = hashByMd5($filesByDataLength{$size});
my %filesByMd5 = %{$filesByMd5Ref};
foreach my $md5 (keys %filesByMd5)
{
my @sameMd5List = @{$filesByMd5{$md5}};
my $numSameMd5 = scalar(@sameMd5List);
next unless $numSameMd5 > 1;
# for each set of dupes, print the full path to the files
my $rsrcMd5;
foreach my $fileInfo (@sameMd5List)
{
my $dirname = $fileInfo->{dirname};
my $filename = $fileInfo->{filename};
my $filepath = "$dirname/$filename";
print "$filepath\n";
if ($checkRsrc)
{
my $rsrcFilepath = "$filepath/..namedfork/rsrc";
if (!defined($rsrcMd5))
{
$rsrcMd5 = calcMd5($rsrcFilepath);
}
elsif ($rsrcMd5 ne calcMd5($rsrcFilepath))
{
print "Resource fork differs\n";
}
}
}
print "----------\n";
$numDupes += ($numSameMd5 - 1);
$numDupeBytes += ($size * ($numSameMd5 - 1));
}
}
my $numDupeMegabytes = sprintf("%.1f", $numDupeBytes / (1024 * 1024));
print "Number of duplicate files: $numDupes\n";
print "Megabytes duplicated: $numDupeMegabytes\n";
}
__________________
hayne.net/macosx.html Last edited by hayne; 03-02-2006 at 08:22 PM. Reason: use sysopen instead of open in order to handle filenames that start with "- "; made faster by first collating by size; now reports if resource forks differ |
|
|
|
|
|
#8 |
|
Registered User
Join Date: Feb 2006
Posts: 1
|
Issues solved re script execution
Thanks Cameron,
That solved it! I had copied and pasted the script into BBEdit, but neglected to set the line breaks to Unix (from the default traditional Macintosh) before chmod'ing and executing. Thanks again! John PS: Sorry for the dupe emails...earthlink webmail was misbehaving. >On 27-Feb-06, at 9:09 PM, John wrote: > >> I found your findDupeFiles perl script at http:// >> forums.macosxhints.com/showthread.php?p=264200 >> and had a problem with it in that I consistently get an error: >> >> john-pilgrims-powerbook:/Library/Audio/Apple Loops johnpilgrim$ ./ >> findDupeFiles.pl '.aif|.aiff' "/Apple Loops for GarageBand" "/Apple >> Loops for Soundtrack Pro" >> use: bad interpreter: No such file or directory >> john-pilgrims-powerbook:/Library/Audio/Apple Loops johnpilgrim$ ./findDupeFiles.pl >> '.aif|.aiff' >> use: bad interpreter: No such file or directory > >> I'm not familiar enough with Perl to debug it myself. The comparison directories >> exist, /usr/bin/perl exists, the script had been chmod'ed, but I don't know what >> the "bad interpreter" error means but I consistently get it, no matter >> if I specify the search directories or not. > >> Thanks in advance for your assistance, >> John > >Hi John > >Please ask these sorts of questions on the forums - e.g. in the >thread that you refer to. That way future readers can benefit from >the answer. >But I believe your problem is that the script file is not using the >correct (Unix-style) line endings. This issue is discussed in the >Unix FAQ that I referred to in that post. > >-- >Cameron Hayne >macdev@hayne.net |
|
|
|
|
|
#9 |
|
Site Admin
Join Date: Jan 2002
Location: Montreal
Posts: 32,473
|
I modified the above script in response to comments made on the article about this script on the main macosxhints site (http://www.macosxhints.com/article.p...06030205235028)
- it now collates files by data-fork size and only uses MD5 to distinguish files that have the same data size. This makes it run about twice as fast as before in my tests on the AIFF loop folders - it now reports "Resource fork differs" when the resource forks of duplicate data files differ - it now handles unusual filenames better (e.g. "- foo" with an initial "-" followed by whitespace)
__________________
hayne.net/macosx.html |
|
|
|
|
|
#10 |
|
Triple-A Player
Join Date: Nov 2005
Posts: 71
|
So, I'm confused... is the post linked above the corrected version of the script, or is the script in this thread above the corrected verison of the script? [please note the required answer isn't "yes" or "no" unless the above sentence is parsed]
![]() Also- I found that "~/Allen's\ WorkFiles/AFM/Data/" this is not a directory. I'm not super great in the unix world... but why wouldn't it be a directory? My thought is that either the ~ or the "\ " is crapping out the script? Thanks!! -Allen |
|
|
|
|
|
#11 | |||||||||||||||||||
|
Moderator
Join Date: Jun 2003
Location: Boulder, CO USA
Posts: 19,853
|
Assuming that the path to the directory in question is / > Users > YourUsername > Allen's WorkFiles > AFM > Data, then the way to refer to that directory would be ~/Allen\'s\ WorkFiles/AFM/Data/ In other words, the ' character (the single quote or apostrophe character) needs to be escaped with a \ as well. Your version above does not escape the single quote, so your computer interprets it differently than you expect. There's no problem with the \ and ~ characters that you do have, assuming my guess as to the path is correct. Trevor Last edited by trevor; 05-16-2008 at 04:24 PM. |
|||||||||||||||||||
|
|
|
|
|
#12 | ||||||||||||||||||||||||||||||||||||||||||
|
Site Admin
Join Date: Jan 2002
Location: Montreal
Posts: 32,473
|
The script supplied in this forums thread is the better version. Alternatively, you can get it from my web site: http://hayne.net/MacDev/Perl/
Trevor has explained how to handle the quote and the space in that folder name - but an alternative (that would make things easier) would be just to rename that folder to have a more Unix-friendly name - e.g.: AllenWorkFiles
__________________
hayne.net/macosx.html |
||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
#13 |
|
Triple-A Player
Join Date: Nov 2005
Posts: 71
|
Thanks guys! I'll try this as soon as I get home. I agree with the more unix friendly directories... actually, the problem with the directory path was created by a snagpath cm plugin.
Guess they forgot to \' all ' characters. ![]() Thank you for your help!! -Allen |
|
|
|
![]() |
|
|