|
|
#1 | |||||||||||||||||||
|
Major Leaguer
Join Date: Jan 2002
Posts: 311
|
This is the result of two other threads i started, here and here. But I wanted to consolidate my findings in one place so that they might be more useful to others who need to do the same thing.
The goal: Batch scan muli-page documents and convert to single PDF files. The tools:
The process: First, you need to get your scans to the desktop. I have found the fastest way to do this, and the one least likely to crash any software, is to use VueScan in "batch" mode, saving all the files as raw TIFF files. Here is my adapted version of VueScan's "Advanced Workflow" page from the user manual:
Once you have done all that, you can save these settings. Make sure you set the extension to ".ini" manually, or VueScan won't be able to read them! Second, place your document in the ADF, and press "scan"!!! A note about double sided documents: If you are doing a double sided document I suggest first scanning one side to one folder, scanning the other side to another folder, and then using the shareware utility "A Better Finder Rename" to rename all the scans in increments of 2 in each folder (one folder starting with 01,03,05, and so on, with the other folder start with 02,04,06, etc.) then merge the two folders before continuing to the third step. Third, I made a shell script (see below) which you can save as a plain text file (make sure it uses UNIX line endings!!!) on your drive. I called it "scanmagick.command". Then you can open the file in "Get Info" and set the system to "open with" the terminal all files ending in ".command". Then you can just double-click on the file!!! But first, you will have to change the file permissions to 755. In the terminal you can "CD" to the directory with the file and type: "chmod 755 scanmagick.command". Or you can use a host of utilities to change file permissions.... This script was written because ImageMagick runs out of memory if it tries to do too much at one time. There seems to be a problem with how it uses temporary memory on OS X. This program breaks it down into each of its steps and does them one at a time. I am still working on it and will probably post a newer version here later. I have found that the program can still run out of memory during the PDF conversion process if working with a large number of files, so I might add a command to clear out the system's virtual memory between converting to TIFF and processing the PDF. These problems only occur with documents that have more than 40 or so pages (I think). Another solution might be to break up writing the PDF. I think it might be possible to write only 10 files to the PDF at a time or something, I have to look in to it. In order to understand the commands used in this script, see the two links to other threads listed at the top. Code:
#! /bin/csh cd /Users/kerim/Desktop/Scans/LandscapeScans echo "Converting LandscapeScans Folder" foreach dir (*) cd $dir echo "Now working on:" $dir echo "Cropping, Roating, and Filtering TIF to PBM!" foreach file (*.tif) nice +10 convert \ -gravity South \ -crop 1700x2200+0+0 \ -rotate "+90" \ -level 10000,1,50000 \ -unsharp 6x1+100+0.05 \ $file pbm:`basename $file .tif`.pbm echo $file "done" end echo "Converting all files to compressed TIFF!" foreach file (*.pbm) nice +10 convert \ -compress zip \ $file tif:`basename $file .pbm`.tiff echo $file "done" end echo "Converting all TIFF Files into a single landscape PDF!" nice +10 convert \ -compress zip \ -page 792x612 \ -adjoin *.tiff pdf:../$dir.pdf echo "All Done with directory: " $dir cd ../ end cd /Users/kerim/Desktop/Scans/PortraitScans echo "Converting PortraitScans Folder" foreach dir (*) cd $dir echo "Now working on:" $dir echo "Cropping, Roating, and Filtering TIF to PBM!" foreach file (*.tif) nice +10 convert \ -gravity South \ -crop 1700x2200+0+0 \ -level 10000,1,50000 \ -unsharp 6x1+100+0.05 \ $file pbm:`basename $file .tif`.pbm echo $file "done" end echo "Converting all files to compressed TIFF!" foreach file (*.pbm) nice +10 convert \ -compress zip \ $file tif:`basename $file .pbm`.tiff echo $file "done" end echo "Converting all TIFF Files into a single landscape PDF!" nice +10 convert \ -compress zip \ -page letter \ -adjoin *.tiff pdf:../../$dir.pdf echo "All Done with directory: " $dir cd ../ end echo "Finished" Last edited by kerim; 11-16-2002 at 01:00 PM. |
|||||||||||||||||||
|
|
|
|
|
#2 |
|
League Commissioner
Join Date: Jan 2002
Posts: 5,536
|
re: [NB]
re: [NB] -- just use the command line continuation symbol \
Code:
nice +10 convert \
-gravity South \
-crop 1700x2200+0+0 \
-rotate "+90" \
-level 10000,1,50000 \
-unsharp 6x1+100+0.05 \
$file pbm:`basename $file .tif`.pbm
__________________
On a clear disk, you can seek forever. |
|
|
|
|
|
#3 |
|
Major Leaguer
Join Date: Jan 2002
Posts: 311
|
ImageMagick just can't handle this job, but I see I could first use ImageMagick to convert to EPS or PS or JPG, and then use GhostScript or jpgtoPDF to finish the job. Because this is a different topic than the scripting of the process itself, I have started a new thread:
here If you have any ideas, please pop over to that thread and give suggestions. Once I figure it out I will post a revised script on this thread! |
|
|
|
|
|
#4 |
|
Major Leaguer
Join Date: Jan 2002
Posts: 311
|
OK, thanks to the abovementioned thread, I have learned a new way to do the TIFF to PDF conversions that puts almost no burden on the system in terms of memory and goes VERY fast!!!
It requires that you install TWO additional pieces of software, both available from FINK:
These include two tools that we will use: "tiff2ps" and "ps2pdf". The first allows fast conversion of all the tiff files to postscript, and the second converts the single postscript file to a PDF. It is amazing how fast they both work compared to ImageMagick. But this means that we had to change the ImageMagick to produce TIF instead of PBM files. So, to convert from greyscale to black and white I now use the "-type bilevel" option. I had had problems with this before and in my tests it still seems to occasionally produce problems, but I suspect this may just be because of a corrupted TIFF file. One other change I made is to move the directory, after all the scans have been converted. To do this you will need another directory called "ToDelete" at the same level as the other two directories. And the final PDF file is now saved at the same level as these directories. This way you can make new scans without having to move the old folders, but you can still go back to the folders if something goes wrong with the scan-conversions process.. Enough talk. Here is the new "scanmagick.command" code: Code:
#! /bin/csh cd ~/Desktop/Scans/LandscapeScans echo "Converting LandscapeScans Folder" foreach dir (*) cd $dir echo "Now working on:" $dir echo "Crop, Roate, Unsharp, Convert to B&W, & Compress TIF (as TIFF)!" foreach file (*.tif) nice +10 convert \ -gravity South \ -crop 1700x2200+0+0 \ -rotate "+90" \ -level 10000,1,50000 \ -unsharp 6x1+100+0.05 \ -compress zip \ -type bilevel \ $file tiff:`basename $file .tif`.tiff echo $file "done" end echo "Converting all TIFF Files into a single landscape PS File!" tiff2ps -h 8.5 -w 11 *.tiff > temp.ps echo "Converting temp.ps to pdf" ps2pdfwr -g7920x6120 temp.ps echo "Moving temp.pdf to " $dir".pdf" mv temp.pdf ../../$dir.pdf echo "All Done with directory: " $dir cd ../ echo "Moving" $dir "To the ToDelete Folder" mv $dir ../ToDelete end cd ~/Desktop/Scans/PortraitScans echo "Converting PortraitScans Folder" foreach dir (*) cd $dir echo "Now working on:" $dir echo "Crop, Roate, Unsharp, Convert to B&W, & Compress TIF (as TIFF)!" foreach file (*.tif) nice +10 convert \ -gravity South \ -crop 1700x2200+0+0 \ -level 10000,1,50000 \ -unsharp 6x1+100+0.05 \ -compress zip \ -type bilevel \ $file tiff:`basename $file .tif`.tiff echo $file "done" end echo "Converting all TIFF Files into a single landscape PS File!" tiff2ps -h 11 -w 8.5 *.tiff > temp.ps echo "Converting temp.ps to pdf" ps2pdfwr temp.ps echo "Moving temp.pdf to " $dir".pdf" mv temp.pdf ../../$dir.pdf echo "All Done with directory: " $dir cd ../ echo "Moving" $dir "To the ToDelete Folder" mv $dir ../ToDelete end echo "Finished" |
|
|
|
|
|
#5 |
|
Major Leaguer
Join Date: Jan 2002
Posts: 311
|
OK. I found my first bug in the script. If there are no items in the first folder "LandscapeScans" it doesn't just go on to "PortraitScans" but it logsout and exits. How do I force it to go on?
TIA! |
|
|
|
|
|
#6 |
|
Major Leaguer
Join Date: May 2002
Location: Sweden
Posts: 282
|
Do you know where it exits? Well, it doesn't matter. You could check for any .tiff files and make the pdf-conversion conditional on it. I don't know how to do it in csh though (or I rather not learn, I'm not a big fan of csh scripting
).
__________________
/PEZ |
|
|
|
|
|
#7 |
|
Major Leaguer
Join Date: Jan 2002
Posts: 311
|
Yes, it exits on the "foreach" command after it reaches the "landscape folder" I guess it must be the "foreach dir" command. So it actually needs to check for folders rather than actual TIFF files. So there must be some way of writing an "if...then..." command to say that "if there are no folders, then go on to the next folder" but I also have no programming experience other than this script and some applescripts I've written...
|
|
|
|
|
|
#8 |
|
Major Leaguer
Join Date: Jan 2002
Posts: 311
|
I found another bug with the script. This one I should be able to fix on my own - it seems that the ImageMagick convert command for "landscape" files screws the image up, while the one for "portrait" works fine. I suspect there is something about the way that page size is identified, but I'm not exactly sure what I'm doing wrong and will have to do some tests.
But the good news is that everything seems to work fine except for these two bugs! |
|
|
|
|
|
#9 |
|
Major Leaguer
Join Date: Jan 2002
Posts: 311
|
Strange Behavior!!!
I'm working on Bug #2 (no idea what to do about #1), and I've discovered something very strange!
It seems that the landscape "tiff" files created by the ImageMagick command look fine in GraphicConverter, but appear all twisted and strange in TiffSight!!! This loooks like it might be a little harder to solve than I thought! |
|
|
|
|
|
#10 |
|
Major Leaguer
Join Date: Jan 2002
Posts: 311
|
It seems that there are some bugs with ImageMagick 5.4.x dealing with "bilevel" images, so until FINK upgrades to 5.5.x we will probably have to stick with this workaround. Basically, I add another step. Intead of directly converting to bilevel, I save as a Portable Bitmap Document (PBM) and then convert those back to TIFF. It slows things down, but at least it works! I only do this for the LandscapeScans Folder, as it doesn't seem necessary for the Portrait scans.
Here is the revised Script. Still haven't figured out how to solve bug #1, so I suggest simply sticking a folder with one image in your Landscape folder if you need to just do Portrait Scans. Hope to have a better solution soon! Code:
#! /bin/csh cd ~/Desktop/Scans/LandscapeScans echo "Converting LandscapeScans Folder" foreach dir (*) cd $dir echo "Now working on:" $dir echo "Crop, Roate, Unsharp, Convert to B&W, PBM!" foreach file (*.tif) nice +10 convert \ -gravity South \ -crop 1700x2200+0+0 \ -rotate "+90" \ -level 10000,1,50000 \ -unsharp 6x1+100+0.05 \ $file pbm:`basename $file .tif`.pbm echo $file "done" end echo "Convert PBM to zip-compressed TIFF" foreach file (*.pbm) nice +10 convert \ -compress zip \ $file tiff:`basename $file .pbm`.tiff echo $file "done" end echo "Converting all TIFF Files into a single landscape PS File!" tiff2ps -h 8.5 -w 11 *.tiff > temp.ps echo "Converting temp.ps to pdf" ps2pdfwr -g7920x6120 temp.ps echo "Moving temp.pdf to " $dir".pdf" mv temp.pdf ../../$dir.pdf echo "All Done with directory: " $dir cd ../ echo "Moving" $dir "To the ToDelete Folder" mv $dir ../ToDelete end cd ~/Desktop/Scans/PortraitScans echo "Converting PortraitScans Folder" foreach dir (*) cd $dir echo "Now working on:" $dir echo "Crop, Roate, Unsharp, Convert to B&W, & Compress TIF (as TIFF)!" foreach file (*.tif) nice +10 convert \ -gravity South \ -crop 1700x2200+0+0 \ -level 10000,1,50000 \ -unsharp 6x1+100+0.05 \ -compress zip \ -type bilevel \ $file tiff:`basename $file .tif`.tiff echo $file "done" end echo "Converting all TIFF Files into a single landscape PS File!" tiff2ps -h 11 -w 8.5 *.tiff > temp.ps echo "Converting temp.ps to pdf" ps2pdfwr temp.ps echo "Moving temp.pdf to " $dir".pdf" mv temp.pdf ../../$dir.pdf echo "All Done with directory: " $dir cd ../ echo "Moving" $dir "To the ToDelete Folder" mv $dir ../ToDelete end echo "Finished" Last edited by kerim; 11-18-2002 at 01:32 PM. |
|
|
|
|
|
#11 |
|
Major Leaguer
Join Date: Jan 2002
Posts: 311
|
A better way?
I ran the above script and it works great. The second "convert" command doesn't slow things down very much. The really long part is the first "convert" command with all the high-level filters (especially "unsharp"). However, I did have an idea. It might be best to leave the whole thing in portrait mode until the end, and then rotate with Ghostscript. Or even rotate in the middle with something from libtiff. But since it works I'm not going to mess with it.
It would be nice to know how to add an "if...then..." statement to fix Bug #1, though. Otherwise I'm now ready to start scanning in all my documents and converting them to PDF!!! Thanks everyone who helped, and I hope this thread is useful for other people. Maybe somone will be inspired to write a document management program for OS X... |
|
|
|
|
|
#12 |
|
Major Leaguer
Join Date: Jan 2002
Posts: 311
|
In addition to Bug #1, there is one other thing I'd like to do.
My document feeder keeps going after there is no more paper. This produces lots of 4k files with no information. I would like to automatically delete these before doing anything. But, and here would be the difficult part - only if they are the last files in the folder. So, if I have a folder of scans: scan01.tif scan02.tif scan03.tif scan04.tif I would want to delete scan04.tif (and higher) if they are 4k, but if scan02.tif is 4k I would want to print an error message and skip the whole folder. This would mean I need to re-scan a document. Right now I do this manually by looking at the file list, but it would be nice to automate it as well. It seems that if I'm going to be adding "if..then..." satements to fix Bug #1, I could do this at the same time! |
|
|
|
|
|
#13 |
|
Major Leaguer
Join Date: Jan 2002
Posts: 311
|
I recently updated FINK, and as a result I ended up updating ImageMagick. This led to some things going wrong with my script. First, the "-gravity" command now works - so the fact that I had it WRONG now mattered! I removed it! Also, now that the bugs in "-type bilevel" have been fixed, I no longer need two steps for the bilevel conversion in landscape mode. Here is the revised script:
Code:
#! /bin/csh cd ~/Desktop/Scans/LandscapeScans echo "Converting LandscapeScans Folder" foreach dir (*) cd $dir echo "Now working on:" $dir echo "Crop, Roate, Unsharp, Convert to B&W, & Compress, TIFF!" foreach file (*.tif) nice +10 convert \ -crop 1700x2200+0+0 \ -rotate "+90" \ -level 10000,1,50000 \ -unsharp 6x1+100+0.05 \ -type bilevel \ -compress zip \ $file tiff:`basename $file .tif`.tiff echo $file "done" end echo "Converting all TIFF Files into a single landscape PS File!" tiff2ps -h 8.5 -w 11 *.tiff > temp.ps echo "Converting temp.ps to pdf" ps2pdfwr -g7920x6120 temp.ps echo "Moving temp.pdf to " $dir".pdf" mv temp.pdf ../../$dir.pdf echo "All Done with directory: " $dir cd ../ echo "Moving" $dir "To the ToDelete Folder" mv $dir ../ToDelete end cd ~/Desktop/Scans/PortraitScans echo "Converting PortraitScans Folder" foreach dir (*) cd $dir echo "Now working on:" $dir echo "Crop, Roate, Unsharp, Convert to B&W, & Compress, TIFF!" foreach file (*.tif) nice +10 convert \ -crop 1700x2200+0+0 \ -level 10000,1,50000 \ -unsharp 6x1+100+0.05 \ -type bilevel \ -compress zip \ $file tiff:`basename $file .tif`.tiff echo $file "done" end echo "Converting all TIFF Files into a single landscape PS File!" tiff2ps -h 11 -w 8.5 *.tiff > temp.ps echo "Converting temp.ps to pdf" ps2pdfwr temp.ps echo "Moving temp.pdf to " $dir".pdf" mv temp.pdf ../../$dir.pdf echo "All Done with directory: " $dir cd ../ echo "Moving" $dir "To the ToDelete Folder" mv $dir ../ToDelete end echo "Finished" |
|
|
|
![]() |
|
|