Go Back   The macosxhints Forums > OS X Help Requests > UNIX - General



Reply
 
Thread Tools Rating: Thread Rating: 8 votes, 5.00 average. Display Modes
Old 11-16-2002, 12:07 PM   #1
kerim
Major Leaguer
 
Join Date: Jan 2002
Posts: 311
Lightbulb ScanMagick

This is the result of two other threads i started, here and here. But I wanted to consolidate my findings in one place so that they might be more useful to others who need to do the same thing.

The goal: Batch scan muli-page documents and convert to single PDF files.


The tools:
  • A scanner with an automatic document feeder (ADF). I have chosen the Epson Perfection 1640SU because it is (a) cheap, and (b) epson has released a twain driver for it already (still in beta), showing better OS X support than other manufacturers.
  • Vuescan. Even though the Epson Twain works great for importing images to photoshop, graphic converter, etc., there is a problem in that no software really supports batch scanning with an ADF. VueScan, unlike a TWAIN driver, is a stand-alone program and will allow you to save your scans automatically to a folder on the desktop. I will give more details below.
  • ImageMagick. Because there is no software available for what I want to do, I have relied on this very powerful UNIX tool which I run from the terminal. Below is a script to automate the process with instructions. (Graphic Converter and OmniPage Pro X both have the ability to do this, but they currently screw things up or crash in the process. And WorkingPapers has not released an OS X version.) I think installed it using FINK, but there are probably other ways to install it as well.


The process:

First, you need to get your scans to the desktop. I have found the fastest way to do this, and the one least likely to crash any software, is to use VueScan in "batch" mode, saving all the files as raw TIFF files. Here is my adapted version of VueScan's "Advanced Workflow" page from the user manual:

Quote:
1) File Menu --> Default Options
2) Set "Device|Option types" to "Advanced"
3) Set "Device|Scan From" to "Perfection 1640"
4) Set "Device|Scan Mode" to "Document Feeder"
5) Set "Device|Media type" to "text"
6) Set "Device|Bits per pixel" to "8 bit grey"
7) Set "Device|Batch Scan" to "All"
8) Check "Device|Lock Exposure" [Note: This will work best if you first turn off batch scanning and preview one of your pages to lock in the best exposure.]
7) Un-check "Crop|Crop Auto Position"
7) Set "Crop|Crop Size" to "Maximum"
9) Set "Color|Black point (%)" and "Color|White point (%)" both to 1.5. [Note: Again, this is something that will depend on your documents, but this works for me.]
10) Check "Files|Save Raw File" [Note: Make sure that "save as Tif" is un-checked!]
11) Set "Files|Save file name" to a folder on your desktop called the name of the document, and place it in a folder called "LandscapeScans" inside a folder called "Scans" : ~/Desktop/Scans/LandscapeScans/DocumentName. Your final PDF file will be in the top level of the Scans folder and will have the same name as this folder. Also, set the file name to scan000+.tif (that should be the default). [Note: The path can be changed, but then you will also need to change the script I wrote below! And if you are doing a portrait scan, save it in a folder called "PortraitScans" in the same "Scans" folder!]
12) Set "Files|Raw file type" to "8 bit grey"
13) Set "Files|Raw save with" to "Preview" [Note: This is necessary for use with the ADF, and is different from what the user manual describes, because that is for use with a film scanner. I found this out from the developer.]
14) To speed things up, you should also make sure that the "Prefs|Refresh Fast" is checked, and the "Prefs|Refresh Each Scan" is un-checked.

If you set the "Device|Lock exposure" option and clear "Crop|Crop auto position", then the "Scan" button won't first do a preview scan. This can save time when batch scanning.

Once you have done all that, you can save these settings. Make sure you set the extension to ".ini" manually, or VueScan won't be able to read them!


Second, place your document in the ADF, and press "scan"!!!

A note about double sided documents: If you are doing a double sided document I suggest first scanning one side to one folder, scanning the other side to another folder, and then using the shareware utility "A Better Finder Rename" to rename all the scans in increments of 2 in each folder (one folder starting with 01,03,05, and so on, with the other folder start with 02,04,06, etc.) then merge the two folders before continuing to the third step.

Third, I made a shell script (see below) which you can save as a plain text file (make sure it uses UNIX line endings!!!) on your drive. I called it "scanmagick.command". Then you can open the file in "Get Info" and set the system to "open with" the terminal all files ending in ".command". Then you can just double-click on the file!!! But first, you will have to change the file permissions to 755. In the terminal you can "CD" to the directory with the file and type: "chmod 755 scanmagick.command". Or you can use a host of utilities to change file permissions....

This script was written because ImageMagick runs out of memory if it tries to do too much at one time. There seems to be a problem with how it uses temporary memory on OS X. This program breaks it down into each of its steps and does them one at a time. I am still working on it and will probably post a newer version here later. I have found that the program can still run out of memory during the PDF conversion process if working with a large number of files, so I might add a command to clear out the system's virtual memory between converting to TIFF and processing the PDF. These problems only occur with documents that have more than 40 or so pages (I think). Another solution might be to break up writing the PDF. I think it might be possible to write only 10 files to the PDF at a time or something, I have to look in to it. In order to understand the commands used in this script, see the two links to other threads listed at the top.

Code:
#! /bin/csh

cd /Users/kerim/Desktop/Scans/LandscapeScans
echo "Converting LandscapeScans Folder" 

foreach dir (*)
cd $dir
echo "Now working on:" $dir
echo "Cropping, Roating, and Filtering TIF to PBM!"
foreach file (*.tif)
nice +10 convert \
-gravity South \
-crop 1700x2200+0+0 \
-rotate "+90" \
-level 10000,1,50000 \
-unsharp 6x1+100+0.05 \
$file pbm:`basename $file .tif`.pbm
echo $file "done"
end

echo "Converting all files to compressed TIFF!"
foreach file (*.pbm)
nice +10 convert \
-compress zip \
$file tif:`basename $file .pbm`.tiff
echo $file "done"
end

echo "Converting all TIFF Files into a single landscape PDF!"
nice +10 convert \
-compress zip \
-page 792x612 \
-adjoin *.tiff pdf:../$dir.pdf

echo "All Done with directory: " $dir
cd ../
end

cd /Users/kerim/Desktop/Scans/PortraitScans
echo "Converting PortraitScans Folder" 

foreach dir (*)
cd $dir
echo "Now working on:" $dir
echo "Cropping, Roating, and Filtering TIF to PBM!"
foreach file (*.tif)
nice +10 convert \
-gravity South \
-crop 1700x2200+0+0 \
-level 10000,1,50000 \
-unsharp 6x1+100+0.05 \
$file pbm:`basename $file .tif`.pbm
echo $file "done"
end

echo "Converting all files to compressed TIFF!"
foreach file (*.pbm)
nice +10 convert \
-compress zip \
$file tif:`basename $file .pbm`.tiff
echo $file "done"
end

echo "Converting all TIFF Files into a single landscape PDF!"
nice +10 convert \
-compress zip \
-page letter \
-adjoin *.tiff pdf:../../$dir.pdf

echo "All Done with directory: " $dir
cd ../
end

echo "Finished"
(Thanks to the MT in the following post - I've gone back and fixed this original post so the code reads easier.)

Last edited by kerim; 11-16-2002 at 01:00 PM.
kerim is offline   Reply With Quote
Old 11-16-2002, 12:41 PM   #2
mervTormel
League Commissioner
 
Join Date: Jan 2002
Posts: 5,536
re: [NB]

re: [NB] -- just use the command line continuation symbol \

Code:
nice +10 convert \
    -gravity South \
    -crop 1700x2200+0+0 \
    -rotate "+90" \
    -level 10000,1,50000 \
    -unsharp 6x1+100+0.05 \
    $file pbm:`basename $file .tif`.pbm
that will work both in a script and on the command line
__________________
On a clear disk, you can seek forever.
mervTormel is offline   Reply With Quote
Old 11-17-2002, 12:10 PM   #3
kerim
Major Leaguer
 
Join Date: Jan 2002
Posts: 311
Unhappy Won't work ...

ImageMagick just can't handle this job, but I see I could first use ImageMagick to convert to EPS or PS or JPG, and then use GhostScript or jpgtoPDF to finish the job. Because this is a different topic than the scripting of the process itself, I have started a new thread:

here

If you have any ideas, please pop over to that thread and give suggestions. Once I figure it out I will post a revised script on this thread!
kerim is offline   Reply With Quote
Old 11-18-2002, 12:31 AM   #4
kerim
Major Leaguer
 
Join Date: Jan 2002
Posts: 311
Lightbulb New Script!

OK, thanks to the abovementioned thread, I have learned a new way to do the TIFF to PDF conversions that puts almost no burden on the system in terms of memory and goes VERY fast!!!

It requires that you install TWO additional pieces of software, both available from FINK:
  • ghostscript
  • libtiff

These include two tools that we will use: "tiff2ps" and "ps2pdf". The first allows fast conversion of all the tiff files to postscript, and the second converts the single postscript file to a PDF. It is amazing how fast they both work compared to ImageMagick. But this means that we had to change the ImageMagick to produce TIF instead of PBM files. So, to convert from greyscale to black and white I now use the "-type bilevel" option. I had had problems with this before and in my tests it still seems to occasionally produce problems, but I suspect this may just be because of a corrupted TIFF file.

One other change I made is to move the directory, after all the scans have been converted. To do this you will need another directory called "ToDelete" at the same level as the other two directories. And the final PDF file is now saved at the same level as these directories. This way you can make new scans without having to move the old folders, but you can still go back to the folders if something goes wrong with the scan-conversions process..

Enough talk. Here is the new "scanmagick.command" code:

Code:
#! /bin/csh

cd ~/Desktop/Scans/LandscapeScans
echo "Converting LandscapeScans Folder" 

foreach dir (*)
cd $dir
echo "Now working on:" $dir
echo "Crop, Roate, Unsharp, Convert to B&W, & Compress TIF (as TIFF)!"
foreach file (*.tif)
nice +10 convert \
-gravity South \
-crop 1700x2200+0+0 \
-rotate "+90" \
-level 10000,1,50000 \
-unsharp 6x1+100+0.05 \
-compress zip \
-type bilevel \
$file tiff:`basename $file .tif`.tiff
echo $file "done"
end

echo "Converting all TIFF Files into a single landscape PS File!"
tiff2ps -h 8.5 -w 11 *.tiff > temp.ps

echo "Converting temp.ps to pdf"
ps2pdfwr -g7920x6120 temp.ps

echo "Moving temp.pdf to " $dir".pdf"
mv temp.pdf ../../$dir.pdf

echo "All Done with directory: " $dir
cd ../
echo "Moving" $dir "To the ToDelete Folder"
mv $dir ../ToDelete
end

cd ~/Desktop/Scans/PortraitScans
echo "Converting PortraitScans Folder" 

foreach dir (*)
cd $dir
echo "Now working on:" $dir
echo "Crop, Roate, Unsharp, Convert to B&W, & Compress TIF (as TIFF)!"
foreach file (*.tif)
nice +10 convert \
-gravity South \
-crop 1700x2200+0+0 \
-level 10000,1,50000 \
-unsharp 6x1+100+0.05 \
-compress zip \
-type bilevel \
$file tiff:`basename $file .tif`.tiff
echo $file "done"
end

echo "Converting all TIFF Files into a single landscape PS File!"
tiff2ps -h 11 -w 8.5 *.tiff > temp.ps
echo "Converting temp.ps to pdf"
ps2pdfwr temp.ps 

echo "Moving temp.pdf to " $dir".pdf"
mv temp.pdf ../../$dir.pdf

echo "All Done with directory: " $dir
cd ../
echo "Moving" $dir "To the ToDelete Folder"
mv $dir ../ToDelete
end

echo "Finished"
kerim is offline   Reply With Quote
Old 11-18-2002, 01:06 AM   #5
kerim
Major Leaguer
 
Join Date: Jan 2002
Posts: 311
Unhappy Bug

OK. I found my first bug in the script. If there are no items in the first folder "LandscapeScans" it doesn't just go on to "PortraitScans" but it logsout and exits. How do I force it to go on?

TIA!
kerim is offline   Reply With Quote
Old 11-18-2002, 01:37 AM   #6
osxpez
Major Leaguer
 
Join Date: May 2002
Location: Sweden
Posts: 282
Do you know where it exits? Well, it doesn't matter. You could check for any .tiff files and make the pdf-conversion conditional on it. I don't know how to do it in csh though (or I rather not learn, I'm not a big fan of csh scripting ).
__________________
/PEZ
osxpez is offline   Reply With Quote
Old 11-18-2002, 10:01 AM   #7
kerim
Major Leaguer
 
Join Date: Jan 2002
Posts: 311
Yes, it exits on the "foreach" command after it reaches the "landscape folder" I guess it must be the "foreach dir" command. So it actually needs to check for folders rather than actual TIFF files. So there must be some way of writing an "if...then..." command to say that "if there are no folders, then go on to the next folder" but I also have no programming experience other than this script and some applescripts I've written...
kerim is offline   Reply With Quote
Old 11-18-2002, 10:14 AM   #8
kerim
Major Leaguer
 
Join Date: Jan 2002
Posts: 311
Angry Bug #2

I found another bug with the script. This one I should be able to fix on my own - it seems that the ImageMagick convert command for "landscape" files screws the image up, while the one for "portrait" works fine. I suspect there is something about the way that page size is identified, but I'm not exactly sure what I'm doing wrong and will have to do some tests.

But the good news is that everything seems to work fine except for these two bugs!
kerim is offline   Reply With Quote
Old 11-18-2002, 10:27 AM   #9
kerim
Major Leaguer
 
Join Date: Jan 2002
Posts: 311
Strange Behavior!!!

I'm working on Bug #2 (no idea what to do about #1), and I've discovered something very strange!

It seems that the landscape "tiff" files created by the ImageMagick command look fine in GraphicConverter, but appear all twisted and strange in TiffSight!!!

This loooks like it might be a little harder to solve than I thought!
kerim is offline   Reply With Quote
Old 11-18-2002, 12:35 PM   #10
kerim
Major Leaguer
 
Join Date: Jan 2002
Posts: 311
Lightbulb Temporary Solution to Bug #2

It seems that there are some bugs with ImageMagick 5.4.x dealing with "bilevel" images, so until FINK upgrades to 5.5.x we will probably have to stick with this workaround. Basically, I add another step. Intead of directly converting to bilevel, I save as a Portable Bitmap Document (PBM) and then convert those back to TIFF. It slows things down, but at least it works! I only do this for the LandscapeScans Folder, as it doesn't seem necessary for the Portrait scans.

Here is the revised Script. Still haven't figured out how to solve bug #1, so I suggest simply sticking a folder with one image in your Landscape folder if you need to just do Portrait Scans. Hope to have a better solution soon!

Code:
#! /bin/csh

cd ~/Desktop/Scans/LandscapeScans
echo "Converting LandscapeScans Folder" 

foreach dir (*)
cd $dir
echo "Now working on:" $dir
echo "Crop, Roate, Unsharp, Convert to B&W, PBM!"
foreach file (*.tif)
nice +10 convert \
-gravity South \
-crop 1700x2200+0+0 \
-rotate "+90" \
-level 10000,1,50000 \
-unsharp 6x1+100+0.05 \
$file pbm:`basename $file .tif`.pbm
echo $file "done"
end

echo "Convert PBM to zip-compressed TIFF"
foreach file (*.pbm)
nice +10 convert \
-compress zip \
$file tiff:`basename $file .pbm`.tiff
echo $file "done"
end

echo "Converting all TIFF Files into a single landscape PS File!"
tiff2ps -h 8.5 -w 11 *.tiff > temp.ps

echo "Converting temp.ps to pdf"
ps2pdfwr -g7920x6120 temp.ps

echo "Moving temp.pdf to " $dir".pdf"
mv temp.pdf ../../$dir.pdf

echo "All Done with directory: " $dir
cd ../
echo "Moving" $dir "To the ToDelete Folder"
mv $dir ../ToDelete
end

cd ~/Desktop/Scans/PortraitScans
echo "Converting PortraitScans Folder" 

foreach dir (*)
cd $dir
echo "Now working on:" $dir
echo "Crop, Roate, Unsharp, Convert to B&W, & Compress TIF (as TIFF)!"
foreach file (*.tif)
nice +10 convert \
-gravity South \
-crop 1700x2200+0+0 \
-level 10000,1,50000 \
-unsharp 6x1+100+0.05 \
-compress zip \
-type bilevel \
$file tiff:`basename $file .tif`.tiff
echo $file "done"
end

echo "Converting all TIFF Files into a single landscape PS File!"
tiff2ps -h 11 -w 8.5 *.tiff > temp.ps
echo "Converting temp.ps to pdf"
ps2pdfwr temp.ps 

echo "Moving temp.pdf to " $dir".pdf"
mv temp.pdf ../../$dir.pdf

echo "All Done with directory: " $dir
cd ../
echo "Moving" $dir "To the ToDelete Folder"
mv $dir ../ToDelete
end

echo "Finished"

Last edited by kerim; 11-18-2002 at 01:32 PM.
kerim is offline   Reply With Quote
Old 11-18-2002, 02:23 PM   #11
kerim
Major Leaguer
 
Join Date: Jan 2002
Posts: 311
A better way?

I ran the above script and it works great. The second "convert" command doesn't slow things down very much. The really long part is the first "convert" command with all the high-level filters (especially "unsharp"). However, I did have an idea. It might be best to leave the whole thing in portrait mode until the end, and then rotate with Ghostscript. Or even rotate in the middle with something from libtiff. But since it works I'm not going to mess with it.

It would be nice to know how to add an "if...then..." statement to fix Bug #1, though. Otherwise I'm now ready to start scanning in all my documents and converting them to PDF!!! Thanks everyone who helped, and I hope this thread is useful for other people. Maybe somone will be inspired to write a document management program for OS X...
kerim is offline   Reply With Quote
Old 11-18-2002, 02:40 PM   #12
kerim
Major Leaguer
 
Join Date: Jan 2002
Posts: 311
Question One more thing...

In addition to Bug #1, there is one other thing I'd like to do.

My document feeder keeps going after there is no more paper. This produces lots of 4k files with no information. I would like to automatically delete these before doing anything. But, and here would be the difficult part - only if they are the last files in the folder. So, if I have a folder of scans:

scan01.tif
scan02.tif
scan03.tif
scan04.tif

I would want to delete scan04.tif (and higher) if they are 4k, but if scan02.tif is 4k I would want to print an error message and skip the whole folder. This would mean I need to re-scan a document.

Right now I do this manually by looking at the file list, but it would be nice to automate it as well. It seems that if I'm going to be adding "if..then..." satements to fix Bug #1, I could do this at the same time!
kerim is offline   Reply With Quote
Old 12-15-2002, 06:27 PM   #13
kerim
Major Leaguer
 
Join Date: Jan 2002
Posts: 311
I recently updated FINK, and as a result I ended up updating ImageMagick. This led to some things going wrong with my script. First, the "-gravity" command now works - so the fact that I had it WRONG now mattered! I removed it! Also, now that the bugs in "-type bilevel" have been fixed, I no longer need two steps for the bilevel conversion in landscape mode. Here is the revised script:

Code:
#! /bin/csh

cd ~/Desktop/Scans/LandscapeScans
echo "Converting LandscapeScans Folder" 

foreach dir (*)
cd $dir
echo "Now working on:" $dir
echo "Crop, Roate, Unsharp, Convert to B&W, & Compress, TIFF!"
foreach file (*.tif)
nice +10 convert \
-crop 1700x2200+0+0 \
-rotate "+90" \
-level 10000,1,50000 \
-unsharp 6x1+100+0.05 \
-type bilevel \
-compress zip \
$file tiff:`basename $file .tif`.tiff
echo $file "done"
end

echo "Converting all TIFF Files into a single landscape PS File!"
tiff2ps -h 8.5 -w 11 *.tiff > temp.ps

echo "Converting temp.ps to pdf"
ps2pdfwr -g7920x6120 temp.ps

echo "Moving temp.pdf to " $dir".pdf"
mv temp.pdf ../../$dir.pdf

echo "All Done with directory: " $dir
cd ../
echo "Moving" $dir "To the ToDelete Folder"
mv $dir ../ToDelete
end

cd ~/Desktop/Scans/PortraitScans
echo "Converting PortraitScans Folder" 

foreach dir (*)
cd $dir
echo "Now working on:" $dir
echo "Crop, Roate, Unsharp, Convert to B&W, & Compress, TIFF!"
foreach file (*.tif)
nice +10 convert \
-crop 1700x2200+0+0 \
-level 10000,1,50000 \
-unsharp 6x1+100+0.05 \
-type bilevel \
-compress zip \
$file tiff:`basename $file .tif`.tiff
echo $file "done"
end

echo "Converting all TIFF Files into a single landscape PS File!"
tiff2ps -h 11 -w 8.5 *.tiff > temp.ps
echo "Converting temp.ps to pdf"
ps2pdfwr temp.ps 

echo "Moving temp.pdf to " $dir".pdf"
mv temp.pdf ../../$dir.pdf

echo "All Done with directory: " $dir
cd ../
echo "Moving" $dir "To the ToDelete Folder"
mv $dir ../ToDelete
end

echo "Finished"
kerim is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump



All times are GMT -5. The time now is 05:35 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
Site design © IDG Consumer & SMB; individuals retain copyright of their postings
but consent to the possible use of their material in other areas of IDG Consumer & SMB.