The macosxhints Forums

The macosxhints Forums (http://hintsforums.macworld.com/index.php)
-   UNIX - General (http://hintsforums.macworld.com/forumdisplay.php?f=16)
-   -   sed script conundrum...sigh (http://hintsforums.macworld.com/showthread.php?t=51725)

Fotmasta 02-17-2006 11:49 AM

sed script conundrum...sigh
 
Source text-

Code:

Title: King KongTitle: Godzilla
Title: Mothra

I worked out the regex to
Code:

'\BTitle: '
to negate the word boundary. In sed though, the boundary is
Code:

\<
but I can't figure out how to negate that. The closest I got was:

Code:

sed 's/\BTitle:/Title:^M/' title.txt
This is totally incorrect because of the \B. I also tried ^\<(Title:)
The wildly different implementations of regex are really p****g me off!

hayne 02-17-2006 12:20 PM

If the "Title" you are looking for is at the start of the line, you can just use the '^' anchor to match it:

's/^Title:/Title:^M/'

Fotmasta 02-17-2006 12:28 PM

I should have included my desired result which is

Quote:

title: King Kong
title: Godzilla
title: Mothra

hayne 02-17-2006 12:32 PM

Quote:

Originally Posted by Fotmasta
I should have included my desired result which is

s/^Title/title/'

hayne 02-17-2006 01:03 PM

Oh - finally I understand.
I thought you were trying to avoid matching "Title:" except at the beginning of a line. Instead you are trying to break up lines that have "Title:" in the middle of them.

I'm not sure how to do this in 'sed'. I too find 'sed' a bit frustrating.
I would advise doing everything in Perl. It's more powerful and has fewer exceptions.

E.g.:
perl -p -e 's/^(.+)Title:/\1\nTitle:/g' title.txt

Fotmasta 02-17-2006 01:04 PM

With-

Code:

sed 's/^Title:/^MTitle:/' title.txt
I still end up with
Title: King KongTitle: Godzilla
Title: Mothra


But it should be:
Title: King Kong
Title: Godzilla
Title: Mothra

BTW- the ^M is a substitute for \r because the shell is choking on \r.

Fotmasta 02-17-2006 01:24 PM

I was preparing to ditch sed for perl when I was stopped abruptly by many posts telling me, "well if you're going to spend the time to learn Perl, you'd be better off learning Python instead." Or variations like that. And before I knew it, I was considering learning a major programming language to do some text cleanup. Ay carumba!

hayne 02-17-2006 01:38 PM

Quote:

Originally Posted by Fotmasta
I was preparing to ditch sed for perl when I was stopped abruptly by many posts telling me, "well if you're going to spend the time to learn Perl, you'd be better off learning Python instead." Or variations like that. And before I knew it, I was considering learning a major programming language to do some text cleanup. Ay carumba!

Well, various parts of Perl can be learned independent of each other. In particular, you can use incantations like what I showed in post #5 as a 'sed' replacement - it's what 'sed' would be like without all the exceptions & special cases.

NovaScotian 02-17-2006 05:51 PM

Quote:

Originally Posted by hayne
Well, various parts of Perl can be learned independent of each other. In particular, you can use incantations like what I showed in post #5 as a 'sed' replacement - it's what 'sed' would be like without all the exceptions & special cases.

But if you were just starting at Unix scripting, hayne, would you go for Perl or Python first? I'm fairly facile with AppleScript and have been told that Python is more like it than Perl. Opinion?

hayne 02-17-2006 06:00 PM

Quote:

Originally Posted by NovaScotian
But if you were just starting at Unix scripting, hayne, would you go for Perl or Python first? I'm fairly facile with AppleScript and have been told that Python is more like it than Perl. Opinion?

Sorry, I don't know Python more than superficially.
I've heard a lot of people like it and that it is growing in popularity.
I think there still is more Perl software (modules) available than there is for Python, but that may well change.

One big difference between Perl & Python is that white-space (e.g. indentation) is significant in Python. I don't like that, but maybe it's all about what you are used to.

Hal Itosis 02-18-2006 07:21 AM

A little awkward perhaps:
Code:

sed 's/Title/\
title/g' /path/to/source.txt | sed '/^$/d'

Since sed won't do "\n", use an escaped\
newline (literally). The pipe just deletes
blank lines.

Not a bona fide "one-liner" granted...
but it's pure sed (sick editor). ;)

-HI-

Fotmasta 02-20-2006 12:40 PM

Hal i tosis-


That was brilliant. thx.

Try a tongue scraper.

acme.mail.order 02-20-2006 11:16 PM

Quote:

Originally Posted by Hal Itosis
Since sed won't do "\n"

Sure it will. I use \n all the time. But you need sed >3.02 (which is now rather old) to do \n and \t. Hello Fink.

Fotmasta:
You are trying to do two things at once here: change case and add a newline. While in this simple case it is certainly possible with something like /[Tt]itle/\ntitle/ you can also use multiple -e sections and pipes, like Hal did:

sed -e 's/Title/title/g' -e 's/title/\ntitle/g' | sed -e '/^$/d'

Unlike hayne, I prefer sed over perl. I don't need the extra functions of perl and I don't have time to wade through piles of documemtation to figure it out. It boils down to what you prefer, and learning time vs. coding time.

Fotmasta 02-21-2006 12:19 PM

I think I goofed on the original post. I inadvertantly changed text.
Let's start with this mashed up text, which needs to be split apart.
Code:

START
title: King Kongtitle: Godzillatitle: Mothra

RESULT
title: King Kong
title: Godzilla
title: Mothra

The best news I've heard all week is that my sed is too old. There's no --version option to see what ships with Tiger. I'm going to check the release notes for the latest sed. My fingers are crossed that it has perl5 regex. I really could use the lookaround for some of my processing.

thx,
FM

NovaScotian 02-21-2006 01:59 PM

If you don't insist on sed as the means, this AppleScript will do the deed:

set input to "title: King Kongtitle: Godzillatitle: Mothra"
set T to "title: "
set R to return
set {TID, text item delimiters} to {text item delimiters, T}
set str to text items of input
set text item delimiters to TID
tell str to set out to T & item 2 & R & T & item 3 & R & T & item 4

Hal Itosis 02-21-2006 04:42 PM

Quote:

Originally Posted by Fotmasta
I think I goofed on the original post. I inadvertantly changed text.
Let's start with this mashed up text, which needs to be split apart.

Don't matter much. Just change
the 'T' in my solution into a 't'...
:rolleyes: or even [Tt] for full coverage.

Quote:

AppleScript will do the deed
set input to "title: King Kongtitle: Godzillatitle: Mothra"
set T to "title: "
set R to return
set {TID, text item delimiters} to {text item delimiters, T}
set str to text items of input
set text item delimiters to TID
tell str to set out to T & item 2 & R & T & item 3 & R & T & item 4
Yikes... that makes sed look... almost easy.

acme.mail.order 02-21-2006 07:14 PM

Quote:

Originally Posted by Fotmasta
There's no --version option to see what ships with Tiger.

That's a big hint that it's OLD. I'm using 4.1 installed with fink.

Quote:

Originally Posted by hal_itosis
Yikes... that makes sed look... almost easy.

Perl: full-featured write-only programming language.
Applescript: full-featured read-only programming language.

NovaScotian 02-21-2006 08:20 PM

Quote:

Originally Posted by acme.mail.order
Perl: full-featured write-only programming language.
Applescript: full-featured read-only programming language.

Cute, and not unapt. Python manages to squeeze somewhere in between, I suppose.

Fotmasta 02-21-2006 08:49 PM

Quote:

Applescript: full-featured read-only programming language.
Wait, I write all of the time with Applescript. What specifically is read-only? text streams?

Oh, geez I just sent my own post off onto a tangent.

(I have to stall while I wait for our illustrious IT professionals to update gnu sed to 4.1.4. I am ready to hear- "install what now?")

...

acme.mail.order 02-21-2006 11:24 PM

Perl is often referred to as a write-only language because it can be nearly impossible to figure it out later unless the script is heavily documented. There is an annual "Perl Obfusication Contest" where people try hard to write the most cryptic, unintellegibe perl program that still does something useful.

Applescript is, in comparison, very verbose and easy to read. But writing it (especially for people used to terser languages) can be frustrating. "Set variable a to the text value of the property of the box with name "input" of the foreground window" vs. "$a = $_FORM["input"]"

Nothing to do with wether you CAN write it, but how easy it is to do.

If you have administrator access to your machine I can send you a binary. Try this in the terminal as a quick test:

mkdir /sw

If it succeeds, we're in business.

hayne 02-21-2006 11:47 PM

Quote:

Originally Posted by acme.mail.order
Perl is often referred to as a write-only language because it can be nearly impossible to figure it out later unless the script is heavily documented.

I note that this "write-only" epithet is usually applied by people who don't know Perl. Perl programs written by competent programmers are not usually hard to understand (assuming of course that you know Perl).
And note that the original Obfuscated Code contest was for C language programs: http://en.wikipedia.org/wiki/Obfuscated_code

NovaScotian 02-22-2006 08:59 AM

Quote:

Originally Posted by hayne
Perl programs written by competent programmers are not usually hard to understand (assuming of course that you know Perl).

And the same is true of any language. It is probably fairer to say that AppleScript is verbose, than to characterize it as read-only, but folks who think Perl is "write-only" have never had to try to decipher a poorly commented program in any language that supported goto.

acme.mail.order 02-22-2006 09:59 AM

Apple ][ BASIC wasn't too bad. But with <30kb usable memory it couldn't get too complicated. I think GOTO and GOSUB were the only loop options other than FOR-NEXT. I hope my school's CompSci teacher didn't need too much therapy after our class. (We drove the Gr.10 algebra teacher to a nervous breakdown).
6502 assembly code did goto, comments were maybe 20 characters each, but then no one expects machine code to be easy to read later. I'll stop now before I date myself too much :D

NovaScotian 02-22-2006 10:32 AM

The first machine I ever wrote code for was a Royal McBee LGP30 - a hard dating to beat, I suspect. Its memory was a rotating drum with 80 read heads on it, 4096 words in total.
http://www.users.nwark.com/~rcmahq/jclark/lgp30.jpg


All times are GMT -5. The time now is 05:29 PM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
Site design © IDG Consumer & SMB; individuals retain copyright of their postings
but consent to the possible use of their material in other areas of IDG Consumer & SMB.