![]() |
does "sed" work differently on Darwin/tcsh?
I'm working with a nice list of sed one-liners, from:
http://sed.sourceforge.net/grabbag/t.../oneliners.txt Some seem to work great on my 10.2 system, but some don't. The one I'm especially interested in is a one-liner to strip html from text files: # remove most HTML tags (accommodates multiple-line tags) sed -e :a -e 's/<[^<]*>/ /g;/</{N;s/\n/ /;ba;}' This gives me an error msg: sed: 1: "s/<[^<]*>/ /g;/</{N;s/\ ...": unexpected EOF (pending }'s) I don't know enough about sed to debug this, I assume the one-liner is basically correct and this may be an issue with tcsh? Any hints? |
It's not tcsh. But GNU sed is very much like vanilla sed on steroids and it might be that OSX doesn't come with GNU sed. Try installing sed with fink and you shall find that that one liner will work.
|
Thanks for the tip. If anyone already has GNU sed installed, can you try this?
I believe it is intended to work with the plain vanilla sed, and I try to avoid stuff with steroids when they're not necessary. :) I noticed warnings in the sed documentation that some examples don't work properly in bash or tcsh shells. The problems I saw mentioned (like using a "!") don't seem to affect this example but I think it's quite possible that the shell is "interpreting" some portion of the one-liner and not sending the right parameters to sed? |
steve_rothman,
Or install, also with Fink: ssed-3.58-1 Description:------Super stream editor based on GNU sed 3.02.80 several new features (including in-place editing of files, extended regular expression syntax and a few new commands). Check it up here: http://sed.sourceforge.net/grabbag/ssed/ Cheers... |
Quote:
Quote:
Quote:
|
another (right) tool for the job...
Code:
$ cat ~/bin/dhtml |
Well sed seems to be the right tool for the job as well. Using the one-liner above:
$ curl -s http://checkip.dyndns.org | sed -e :a -e 's/<[^<]*>/ /g;/</{N;s/\n/ /;ba;}' Avoids bringing up a a bloaty Perl interpreter. Then you have awk: $ curl -s http://checkip.dyndns.org | awk -v RS='<[^<]*>' '{print}' |
steve_rothman,
I think you better follow osxpez good advice. It also works well in tcsh: Code:
1 [sao @/Users/sao] : curl -s http://checkip.dyndns.org | sed -e :a -e 's/<[^<]*>/ /g;/</{N;s/\n/ /;ba;}'Cheers... |
gawk
I don't know why Apple choose to not use more GNU tools. But when I tried my own awk one-liner on my Mac i see that it doesn't work! I haven't checked what version of awk that comes with OS X, but it's neither mawk nor gawk, because then it would have worked. (Both treats RS as a regexp.) The fix was surprisingly simple:
$ sudo apt-get install gawk 'Surprisingly' because since Jaguar, few packages have been apt-get-able and I have gotten quite used to being forced to use fink. gawk seems to be in the exclusive set of tools that has a binary dist. |
osxpez,
There are several packages you could install with apt-get now, although I don't recommend it in Jaguar, until they release Fink-0.5.0. which will be probably next week, and then, quite a few binary packages will be available to install with apt-get and dselect. By the way, 'fink install gawk' takes from start to finish, exactly 3 minutes on an iMac 400 MHz. Cheers... |
It took just a few seconds using apt-get. :) When you say several packages being available through apt-get. Do you mean the 15 or so packages (of which most of them are Fink stuff) listed by "dpkg -l" or can I reach other packages as well?
Fink 0.5.0 within a week! That's great news! Any idea of when we can have a kde 3 on Jaguar? One of the things I lack the most from 10.1 is Eterm. xterm sucks ass since it's slower than a pregnant cow swimming in maple syrup. Will I see Eterm in Fink 0.5.0 you think? Sorry for straying away with this thread. ... |
Sorry, but giggling seems highly appropriate at this stage!
Avoid bringing up a "bloaty" perl interpreter; instead use awk. No, not awk, but *sed*, you know, the new "default" sed. Not the sed that you might have. Or awk, but not *awk*, the *real* awk, which is gawk (or maybe you might need mawk, who knows? Hang on, maybe that's nawk? Now which version was it that changed the regular expression syntax to the way that I use it here?). Just download the appropriate version; takes hardly any time at all (compared to starting up that bloaty perl interpreter!). Redux: if you're running a machine that's capable of "producing" Mac OS X then the time spent wondering whether to use a porcine perl or an antsy awk or a slender sed is a dozen times longer than it would take to run any of these utilities a dozen times. If you want to wear a belt with a corkscrew, a knife, a gknife (the new standard -- of course!!) a hoof-cleaner (for wildebeest) and a pair of scissors attached then that's your business, and maybe that of the fashion police. Give me a swiss-army chainsaw any day. It'll be cranked into action in a fraction of the time that you spend sorting through your dinky dingly dangly dongles. Reminds me of a very silly little piece of video: "why sumo is better than karate".... http://www.nivenspaws.com/whysumoisb...hankarate.mpeg Cheers, Paul (guess I should add in a debitchifying smiley at this stage? ;) ) ps forgive me: it's been a really bad couple of days... |
ha! thanks, /usr/bin/paul -- i appreciate your perspective.
somewhere i saw 'snawk' -- new and improved awk! now with more sed! |
osxpez,
Yes, still I would tell you that the only way to install in Jaguar at the moment is from source and to wait for Fink-0.5.0 to install binaries using apt-get. About kde 3, in reality it already does work very well on Jaguar, you just have to install it from source. Same for Eterm. I really don't know which binary packages will be inmediately available with the new release. As you know, the quantity of packages available in binary much depends on the feedback given to the maintainers, so if there is positive feedback about using a package in unstable, they can move it to the stable branch. For the last month or so, the Fink maintainers have being calling for feedback from users, preparing for the release of 0.5.0. Cheers... |
paul: Sorry if I stepped on a soar Perl toe. :) I still think it's worthwhile to have the GNU versions of awk and sed. (And if you think it was about regexp syntax you should check again, the same expression was used through all seds/awks and even Perl.) By the way I've seen people using MS Word macros for tasks like stripping HTML. In that comparison Perl seems light weight. (And awk and sed more like spit in the ocean.)
sao: Eterm and KDE 3 are not available via fink (as far as I can see) so there's no way to test them in unstable. What do you mean by installing them from source? Grab the tar balls from the respective sites and have a go? I might do that with Eterm. |
osxpez, I find eterm and KDE 3 stuff when I do a 'sudo fink list':
eterm 0.9-1 Color VT102 terminal emulator ---- kaboodle 3.0.7-3 KDE - simple media player kalzium 3.0.7-3 KDE - periodic table kaphorism 3.0.7-3 KDE - display proverbs and aphorisms ... kxine 0.5-0.20020 KDE DVD and video player. kxmlrpcd 3.0.7-3 KDE - inter-process communcation server ---- A 'sudo fink install eterm' should certainly work for a compile from source. KDE would take a long time but is also doable. I don't know if those are in unstable or not for your testing. I think those listed are more likely stable versions but I could be wrong. I guess that running the 'sudo fink selfupdate-cvs' and then a 'fink update-all' afterwards would get you the latest available though. |
osxpez,
I missed your last post above, sorry. (just read it) Like thatch said, if you run 'fink selfupdate-cvs' , and 'fink update-all', they will come up sooner or later in your 'sudo fink list'. What's you fink version right now?? (fink -V) Cheers... |
thatch, osxpez,
Actually, in Jaguar (10.2) you can install right now from source, with 'fink install packagename' the following KDE 3 packages: Code:
228 Sao @ ~ $ fink list -i --section=kde |
Continued from above post...
Code:
i korn 3.0.7-3 KDE - new mail notificationKDE was one of the first things I installed after upgrading Fink for 10.2 (Sorry the list was too long to put in just one post) Cheers... |
And the same goes for Eterm in Jaguar:
Code:
i aterm 0.4.2-2 Afterstep XVT - a VT102 emulatorCheers... |
Thanks everyone!!!!
I just want to say thanks to everyone!
Getting the Gnu version of sed did indeed solve my problem - it amazes me because the source for my one-liner is from 5 years ago, so it sounds to me like the version of sed included with Jaguar must be *really* old. Oh well. More than dealing with sed, though, I have learned a lot from all the comments on this thread. Thanks a ton, everyone. -Steve |
steve: It doesn't have to be *very* old. It's just not GNU sed. Again, I'm surprised that Apple chooses not to go more fully GNU. And the choice of tcsh over bash or zsh is strange as well. It's like Apple has somehow grasped that Unix is powerful, but missed that GNU is the organisation that takes it to a fuller potential.
|
Thanks for the info thatch and sao! I'll try just that when my "fink install 'gtk+'" finishes ... which could take quite some while it seems.
$ fink -V Package manager version: 0.11.0 Distribution version: 0.4.9.cvs |
I did both "fink selfupdate-cvs" and "fink update-all" but I still can't eterm:
$ fink list '*eterm*' Information about 769 packages read in 1 seconds. I noticed on sao's output from fink that you had about 1,000 more packages... What am I doing wrong? |
osxpez, did you use sudo for your self-update command? If so, something is not right because this is what I get:
Code:
fink list '*eterm*' |
osxpez,
You probably forgot to edit your fink.conf file: Code:
Trees: local/main stable/main stable/crypto local/bootstrap unstable/main unstable/cryptoDid you installed Fink from scratch from here?: http://fink.sourceforge.net/news/jag-bootstrap.php Maybe you show us the result of 'sudo fink list' Cheers... |
sao: Thanks for beeing there! Yes, I installed Fink from scratch again using that URL. And I'm bloody sure I entered that "unstable" record into my fink.conf since the installation instructions told me to do so. But now when you suggested it again I had a check and it wasn't there. No wonder we have different "fink list" output. I'm running a "fink selfupdate-cvs" as of now and it surely seems to be working harder this time. :)
One more thing. You ask me if I used sudo for the fink update things. No, since fink seems to be using sudo itself (asking me for the password and all) I didn't think that was necessary. |
osxpez,
Great, after your 'fink selfupdate-cvs' finishes, your list of available packages to install from source will increase by a lot. You are right about using 'fink selfupdate-cvs', fink will automatically do it for you, asking you for a password, even when you didn't write 'sudo'. Cheers... |
Way off subject
Just to make everyone mad (or provide an alternate solution to using sed, perl, {g,m,a}wk):
% links -dump http://checkip.dyndns.org Current IP Address: 12.34.56.789 Hostname: some.place.com |
:mad: grrrrrrrrr!eat tip, TiMan!
|
Yes, that could be the right tool for the job. But maybe a bit bloaty if you just want to strip the html tags from a file on local disk. You all probably know that life is entirely about saving CPU cycles? :)
|
The beauty of the forums...I learn something new every day!
Now, we can all choose whatever way we need it. Good tip, TiMan...! Cheers... |
Well for local files, I do:
cat foo.html | links -dump | less but with all that piping and evoking of cat, links, and less, it might be a bit too bloaty for everyone to use ;) |
UUOC watch dog - That should probably be:
links -dump < foo.html | less |
i'm pretty sure this is a case of ĜUUOC...
$ links -dump < poop.html | less URL expected after -dump pez, don't make me come over there. |
Well, I don't have links installed, but that _must_ go for the "cat" version as well or links is a truly wierd program. What about:
$ links -dump bar.html | less Depends some of what links regards as an URL. |
It's not the processor cycles, but the typing that gets to me.
links -dump bar.html | less not only works bit there is less typing than with my original siggestion. Thanks! |
Great thread!
This one might deserve a rating it's so good, and even if you don't use links. Very educational.
BTW, I didn't realize that I didn't need sudo when doing a fink install. It's always been a habit while installing with fink for me. kudos to that movie from Paul for it's demonstration of 'size really does matter'; at least in that instance the chainsaw won hands down, literally. |
Re: Great thread!
Quote:
http://home.mindspring.com/~bduart/tobor.mov [ thanks, henry ] |
Quote:
I like the movie. Strange though, I can't seem to save it with the 'save as' in IE. It complains that it can't save because the disk may be too full. My drive is only 41% full. There are currently no other movies in my ~/Movies. With Paul's movie earlier today, I saved it and it played in a finder window from column view. But when I tried to play it again later on, it complained it couldn't find some certain file with a strange name consisting of numbers and letters, I think. |
hmm, odd moovees with odd problems. IE Save As... here gets the same disk full error. slurp it commando...
$ curl -O http://home.mindspring.com/~bduart/tobor.mov |
Thanks mT, curl worked great and same for Paul's movie too. I guess I ought to submit yet another bug report on IE.
|
Above examples wont work
The perl, and awk lines given above wont work with multi-line html tags
instead curl <url> | perl -we 'undef $/; $s = <>; $s =~s/<[^<]*>/ /g; print $s' should do the job in one line. |
Re: Above examples wont work
Quote:
Quote:
|
A bit late, but...
Just realized I coud use "w3m" for local files also: $ cat foo.html | w3m -dump -T text/html >foo.txt works fast and well. Cheers... |
UUOC!
UUOC watchdog (me) says:
Save typing, resources, and nag-points by doing it like this instead: w3m -dump -T text/html <foo.html >foo.txt |
osxpez,
I knew I would win somethings: "Useless Use of Cat Award" ...Thanks! :D I will make sure I put the 'cat' away and use it as you suggested. Cheers... |
surfraw
I just found the most excellent use for "links -dump"!!!
I've not been using surfraw all that much since it insisted on fireing up an UI based browser. Not "raw" enough for me. But with these lines in my surfraw.conf: def SURFRAW_graphical no def SURFRAW_text_browser links def SURFRAW_text_browser_args -dump doing a command like this: $ webster marshmallow I get output like this: Code:
Merriam-Webster home IFrame [IMG] |
osxpez,
I tried it and works very well, and it looks stormy in Singapore with: $ wetandwild singapore Code:
EditedCheers... |
Cool! But not very useful for a swede. I don't understand Farenheight! =)
|
| All times are GMT -5. The time now is 05:36 PM. |
Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
Site design © IDG Consumer & SMB; individuals retain copyright of their postings
but consent to the possible use of their material in other areas of IDG Consumer & SMB.