The macosxhints Forums

The macosxhints Forums (http://hintsforums.macworld.com/index.php)
-   UNIX - General (http://hintsforums.macworld.com/forumdisplay.php?f=16)
-   -   sed question (http://hintsforums.macworld.com/showthread.php?t=15478)

mervTormel 09-29-2003 12:07 AM

Quote:

Originally posted by jbc
...Had to put your one liner in a shell script to get it to work as a standalone script. Is there a benefit to using a .pl file instead? Kept giving me syntax errors when I tried this...
of course, by now, you know, if we could see the syntax error(s) in context, we might be able to extrapolate and speculate your issues :D

i can never understand user reticence to actually post the actual error text at this point, which is the single most important factor in diagnosing the issue at this juncture.

jbc, can you humor us and illuminate us, psychologically?

jbc 09-29-2003 01:44 AM

Sorry, merv...long day. And it's just an "I'm a perl dummy" error pretty much. Since it's a one-liner, there's not much context.
Code:

Line 1:  String found where operator expected near \
"pe 's/^(--[^\n]*)\nContent\-Type:\s*text\/html.*$1/$1\n/msig'"
Line 1:  syntax error near \
"pe 's/^(--[^\n]*)\nContent\-Type:\s*text\/html.*$1/$1\n/msig'"

I think some of the parameters in Paul's example are command line switches for perl, so aren't being digested well. Just tried dropping the "perl" at the beginning of the line until I can get into my books to figure out how to specify these within the script correctly. Gives the same errors in both cases.

Maybe I should clarify that ultimately the script will be called from an MTA with the line "transport_filter \path\to\perlscript". An email message is sent to the script as stdin by the MTA, and it then replaces the original message with stdout from the script. So it has to be a standalone script of some sort, not a terminal command.

jbc 09-29-2003 02:38 AM

Dug out the llama; I'd forgotten about being able to specify option switches on the shebang line. Works fine now except for a "Can't emulate -e on #! line" error. Tracking that one down...

[edit: Uh, duh. It's getting late...deleted -e option....pl file is fine]

pmccann 09-29-2003 03:18 AM

Just kill the "-e" bit: you're right, that's just a command line flag that says "the stuff between the quotes is the *e*ntire script". Definitely not necessary for a script stored in a file.

In other news: you're right, more extensive (but not extensive enough!) testing shows that my purported solution needs some work. In particular I'd stupidly assumed that each MIME section had it's own unique ID, rather than each *message* having such an ID. Oops. I'll have another go tonight and try to wrestle this thing to the ground.

Cheers,
Paul

jbc 09-29-2003 04:07 AM

Paul-

Think I found a perl finesse that was causing the problem...I'm not sure I understand it yet, but you probably will.

The original line you posted deleted everything from the first match to the end of the input. Looked ahead in the llama book and found "non-greedy" quantifiers, but ".*?" caused the script to delete only two lines for each match.

Then while reading about "memory parentheses", I noticed mention of "back references" ("\1") vs "memory variables" ("$1"). Tried changing the first occurrence of the memory variable to a back reference, and it works perfectly!
Code:

#!/opt/local/bin/perl -0 -p
s/^(--[^\n]*)\nContent\-Type:\s*text\/html.*\1/$1\n/msig

Truly impressive! You don't want to know how ugly the sed version was getting (don't want more exploding heads). One line of perl does everything and faster!

Need a few tweaks to be sure the script is working with whole lines, and it will be ideal for my needs. Thanks very much for getting me pointed in the right direction!

As far as mime parts, it seems to be the case that each part begins and ends with the same boundary identifier, although a multipart message may have different boundary identifiers for different parts. Basically the "part" consists of the the starting boundary, the type/encoding/etc headers, the content, and the ending boundary, as near as I can tell. Sounds as if you had it right.

pmccann 09-29-2003 04:13 AM

Good timing: I'd been fooling around for about fifteen minutes trying to nail down the details, and was just about to post what I imagine (will check in a moment) is pretty much exactly what you've got: that is...

perl -p -0 -e 's/^(--[^\n]*)\nContent\-Type:\s*text\/html(.*?)(?=\1)//msig' filename

(((Whir, whir, whir...))) OK, very similar to yours: mine uses a little bit of fancy regexpness -- the "negative lookahead operator", which is the (?=...) piece of the above puzzle. That is, the substitution "peeks forward" and only matches when there's a copy of the MIME boundary hanging around on the end, but doesn't "consume" the boundary. Same end result as what you've done in substituting it back in, but perhaps a little more elegant. And almost certainly less efficient, but it's still pretty much instantaneous, so who cares?

Nice work, by the way, in hunting down the problems!

Cheers,
Paul

jbc 09-29-2003 04:33 AM

Ah, the "non-greedy" quantifiers need to be in parentheses to work here! Must've missed that somehow. Will definitely use them, since "non-greediness" is critical to not mangling the mail with this.

One more good pointer...thanks, Paul.

Brad

jbc 09-29-2003 05:03 AM

Paul, one final note...your last version was correct. My script failed in some cases where the boundary was shared between two parts that were to be removed, presumably because the boundary I put back in was not getting matched as the start of a new section.

The negative lookahead operator seems to solve this problem.

It's 2 AM...I'm off to bed.


All times are GMT -5. The time now is 06:15 PM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
Site design © IDG Consumer & SMB; individuals retain copyright of their postings
but consent to the possible use of their material in other areas of IDG Consumer & SMB.