![]() |
Simple grep question
What is the difference between these two commands?
grep 'zz*' filename grep z filename |
operationally, there is no difference. The * mean that the preceeding character can be matched zero or more times. So, any occurance of "z" will be flaged by both queries. If you want a "zz" specifically, leave off the *.
|
No difference - that's what I thought.
I was confused because the book I'm reading said you had to use grep 'zz*' filename to find lines that have "one or more z's" which is what I thought grep z filename would do. I couldn't figure out what the difference was as they appeared to do the same thing in my mind. Thanks for comfirming it for me. :) |
Actually, I have another question. There had to be a point the book was trying to make.
Can someone give me an example where it is preferred (or required) to use the syntax grep 'zz*' filename instead of grep z filename? Or is it just a case of semantics? |
And while I'm at it... what's the point of searching for "zero or more" of anything? Zero would mean the search string is not present. More than zero would mean it is present. Wouldn't that always be the case? Either it's there or it's not there!
I'm completely missing the entire purpose of 'zz*'. I can't think of a single reason for having it there so it has me confused and baffled. |
It would probably help if you thought of an example with substitution. I know you're just starting out. Example:
I have a word in a file called "zoolander". If I do a search for 'z' and substitute it with 'A', the result would be "Aoolander". If I do a search for 'zz*' and substitute it with 'A', the result would be "A". So when you search for 'z', it locates zoolander. And when you search for 'zz*', it locates zoolander. Clearer??? HTH |
Hmm... I think I might be following you here. Let me see if I understood you.
This line, grep 'zz*' filename is saying: Find words that begin with the letter z and have anything after it. Which is the same as a ls z* (except we're doing a search for characters/words within the file). Is this right? |
Hello,
This interests me. So I tried something on the terminal: Code:
% cat test.txtThis makes me feel like taking a unix class test 2 years ago hehe... :p |
Nope, this is not working as I understood bakaDeshi to say. The results of 'v' and 'vz*' are exactly the same. Look at this:
Code:
% cat test.txtWe can use '^z' to find lines that begin with a specific character, but not words within the line. Grepping (is that a verb?) for ' v' will find words beginning with v if we assume that the word is prefixed with a space, but that's not always going to be the case (what if the word has a parenthasis before it?). Further, ' v' would not find words starting with 'v' that are in the beginning of the line (but we could combine the search with '^v' if needed). I see absolutely no difference between using 'v' and 'vz*'. Which means I continue to see absolutely no purpose for the syntax 'z*' (other than to confuse me. :p). |
Also look at what happens when the line has just one single character (notice the last line just has a v).
Code:
% cat test.txtAm I just being dense here? Why would Unix have a search command that looks for zero or more of anything? |
So I look for answers on the web, and think I found it.
http://advisor.uchicago.edu/docs/unix/reg-exp.html Quote:
Code:
% cat test.txtAnd I thought this was going to be a very simple question to answer when I posted this thread. :( |
Welcome to the world of regex!
I'm in confusion too now... :D Try putting [] inside the ' '. So, using your example, I tried Code:
grep '[tech*support]' test.txt |
well, some documentation suffers from inaccuracies. always be skeptical of docs, and look for validation and corroboration in the exercise.
in regular expressions, * is used to match zero or more occurrences of the preceding regexp, which is typically a single char so, grep "tech*support" will match "tec" and zero or more of the letter 'h' followed by "support" what you need is the dot metachar in there to wildcard 'any single char' Code:
$ grep "tech.*support" foo |
Quote:
the notion of finding 'something-anything-somethingElse' could not be accurately accomplished without the notion of 'anything' containing zero or more occurrences of a regexp. right? because 'anything' could be nothing (zero occurrences) and that makes our pattern search work. |
Quote:
grep "tech*support" would look for any string "tec_support" where _ is zero or more instances of the character h (the asterisk follows the "h" and there is no space). Thus, while "techsupport" and "techhhhhhsupport" would be found, "tech support" and "technical advisors and support" would not. As merv pointed out, you want to search for the entire phrase "tech", followed by any characters any number of times, followed by the phrase "support" -- thus you would want to use grep "tech.*support" where . is a wildcard (so .* would look for any character zero or more times) |
Quote:
Code:
% grep 'tech.*support' test.txtThe rest of what you guys wrote also made sense to me. I was able to reproduce everything the way it was presented here. It helps to see it work in action. Thanks to everyone for helping! You guys are a great! :) |
Quote:
|
BTW, Merv, remember our previous discussion regarding pipes? By golly, I think I'm finally getting that one down. I figured this one out on my own.
Code:
% cat test.txt |
Hi Vicki,
I don't really want to rain on your parade, but "metacharacters" (the beasties such as *, +, . that have special meanings within a regular expression) are generally stripped of that special meaning when used inside a "character class", which is what the square brackets produce. Suppose we have: % cat starry one star * no star my star your * something plus a + Then observe the following: % grep '[h*]' starry one star * your * % grep '[k+]' starry something plus a + (That is; if the h* was interpreted as "zero or more h's" you'd get everything from the first grep, and if k+ was "one or more k's" you'd get nothing out of the second grep. As it is, you get the lines that much the * and the + literally.) Hope that makes sense. Cheers, Paul |
Quote:
I thought you had to use the \ to turn the metacharacters into literal characters. I didn't consider how the [] would change that. Thanks for pointing that out. |
Note to self (and maybe to others?). The basic "grep" command, which invokes "basic regular expressions" is **really** basic. No "+" metacharacter, no "?" metacharacter (for "0 or 1"), and so on. As mT mentioned in a parallel thread recently, setting your "GREP_OPTIONS" environment variable is one way to get around this, so that you have "extended regular expressions" available by default.
For tcsh users (if you don't know what I'm talking about here: **that's you!!**) setenv GREP_OPTIONS "--extended-regexp" Chuck this into your .login file[*] in your home directory (and make one if you haven't got one). Then simply "source .login" and you're armed with extended regular expressions in grep. Hey, this seems to work OK for bash as well. Strike one up for using .login instead of a shell specific file. Merv also mentioned that he was using a few other options for grep. See the thread concerned if you're interested (and "man grep"): http://forums.macosxhints.com/showth...ht=GREPOPTIONS Cheers, Paul [*] Aaargh, I fear another "why not use /usr/share/init/tcsh/..." thread coming on. Rest assured, I'll shut up. Really. Well, maybe. Not even "maybe"? Must be the weather... Now don't be a baby. Apologies for any horrid muzak that's leapt into readers' minds. 'twas a *nasty* thing to do to you. |
I didn't know about $GREP_OPTIONS. Couldn't that confuse scripts that depend on grep not using extended regexps? The old school way would be to use egrep for that.
|
Yeah that really had me confused for a while. I was trying to search for any file with the suffix .img.<number>
eg .img.3 or .img.8 but not .img.12 so I had grep '\.img\.[0-9]{1}' ie. search for any .img with a single digit number afterwards. It never produced any results - because basic grep keeps the { as literals - had to change them to \{ and \} to get it to work. BTW, I was just trying to find a way of using the command line to concatenate split files in order. Turns out I can just go cat *.img.[0-9] *.img.[0-9][0-9] > output.img and I don't even need to use grep or anything. |
Quote:
alias greep 'grep --extended-regexp --ignore-case' so that (extended regexps + case independence) is the default. The problem that raises is remembering what the thing's called. 'greep' has nice a nice mnemonic character... Thanks (again) for the wake-up call. Cheers, Paul |
But, what's wrong with using "egrep"? Or "egrep -i" if you want to disregard case. (You could always alias egrepi for "egrep -i").
|
I just added setenv GREP_OPTIONS "--extended-regexp" to my ~/.login file. Now that I have some basics using grep, I'd like to see if I have any scripts calling grep that may be affected by this change.
Can someone tell me which dirs have scripts that run automatically? Or is there a good way to narrow down the search so I don't search my entire HD? |
I checked the following 4 dirs for 'grep':
/etc/ /usr/share/init/tcsh/ ~/ ~/Library/init/tcsh/ And I found these lines: [share/init/tcsh] aliases:alias word 'grep \!* /usr/share/dict/web2' # Grep thru dictionary completions:alias list_all_hostnames 'grep -v "^#" /etc/hosts' completions: 'n@-framework@`ls -1 ${framework_path} | grep .framework\$ | sed 's/\\.framework//' | uniq`@' \ [~/Library/init/tcsh] aliases.mine:alias findit "ps ax | grep \!:1 | grep -v grep" Some of the code is above my head so I'm not sure what they all do. Can someone tell me if the setenv will negatively affect anything? |
the only scripts of yours that run automatically would be ones in your crontab, but i don't think they'll be run with your interactive login environment.
what you have to worry about are your ~/bin/ scripts (and /usr/local/) that you run that do greps in your interactive shell and make sure the regexps are extended regexp savvy or that your environment is clean of the GREP variables. i have a bash function 'zung' to toggle GREP variables in and out of existence when i think i might be rogue. grok? btw, the grep --ignore-case switch is slightly more valuable than the others as it ignores case in both the source and the target. |
While looking through man grep, I found this option:
Code:
-E, --extended-regexp |
well, that's why the options are there. it's entirely up to you how to conduct your shell world. there are tradeoffs, upsides and downsides to every issue.
-- no doubt about it, there's two sides to every story. |
I feel totally invisible here! :)
Yes, I think "grep -E" is much better than tampering with the environment. But still "egrep" is there for these kinds of things. In the old days egrep used to be a different program than grep, but that handled extended regular exressions. But reading the grep man page now seems to indicate that egrep is actually a link to grep (or if it is vice versa) and that grep checks what name it was called by and then switches extended regexps on. Maybe a small history lesson could shed some light on this: "grep" is short for "global regular expression print". That's what grep (without options) does; It globally aplies the regular expression and than prints rows that matches. My guess is that "egrep" stands for "extended global regular expression print". What the hell "fgrep" stands for is beyond me! :) |
I totally fail to see the downside with using egrep.
|
Quote:
Quote:
fgrep - I have no idea, but I took it as "file" grep. :) |
The f in fgrep stands for "fixed strings". The irony with fgrep is that it doesn't involve regular expressions. It's "fixed strings global regular exression print". If you, like me, enjoy geek humour then this should make you at least smile. :)
|
One more thing. The grep man page on RH Linux says that egrep "is similar (but not identical) to grep -E". egrep is the "old school Unix" compatible one. The OS X man page didn't state this I think. It would be interesting to know how "egrep" an "grep -E" differs. Someone please enlighten me.
|
fgrep is fast grep, which it isn't, or fixed grep, because it doesn't accept metachars. egrep is your fastest grep today, me thinks.
from o'reilly's unix power tools: the old saw unix beginners use grep because it's all they know about intermediate users use fgrep because the manual says it's faster advanced users use egrep because they've tried it --- there are some timing tests here, and egrep beats even perl in both clock time and cpu usage. fgrep has its uses; searching for literals, like *, it can save you some quoting. i would doubt very much that egrep and grep -E have any difference in the OSX or GNU incarnations. |
Of course egrep is faster than Perl on regexp matching. Or most regexps anyway. deterministic regexp matching is most often faster than non-deterministic. That's why awk often is so much faster than Perl on raw regexp matching. But Perl needs its non-deterministic engine, because otherwise you couldn't use backreferencing. Gawk has special sub() functions that do non-deterministic. Clumsy, but at least the gawk programmer can trade speed for functionality.
|
The thing I like best about egrep is:
egrep foo\|bar\|baz textfile which will find all occurrences of foo or bar or baz in the file (the back slashes are so the shell doesn't think you are trying to pipe). |
Quote:
I'm able to do egrep foo|bar|baz textfile without the backslashes. In fact, when I add the backslashes, it doesn't work anymore. I first thought maybe it was because I chose to not use the extended option with grep, but egrep is supposed to already have the extended options. Any idea why I don't need/can't use the backslashes? |
Wow, it has got busy 'round here! Quite the most polite hornet's nest that I've ever had the pleasure of stirring!
On osx egrep and 'grep -e' are the same thing: first page of "man grep" Quote:
% ls -l `which grep` -rwxr-xr-x 1 root wheel 105548 Aug 4 23:21 /usr/bin/grep % ls -l `which egrep` -rwxr-xr-x 1 root wheel 105548 Aug 4 23:22 /usr/bin/egrep And so on for fgrep. That is, it's the same binary, but responds differently depending on the name by which it's called. Why they don't just link to the same thing (ie have egrep and fgrep as links to grep) I have no idea. Anyone? this same structure occurs in various other places as well. (compress/uncompress, batch/at/atrm/atq, merge/rcsdiff/rcsmerge, gunzip/gzip/zcat/gzcat, csh/tcsh, zsh/sh, tar/pax/cpio). Byte for byte the same (each of slashed alternatives), yet separate copies. Having made that list I've forgotten what vital and entertaining information I'd intended to convey. Harumph. Just use egrep? Yeah, but that's no fun! I'm still on a campaign to convince the world that greep rocks. "The change would be very subtle....It might take ten years or so.... Gradually his grep would change it's shape....A more hooked nose... Wider, thinner lips....Beady eyes....A larger forehead." (A longer name, case insensitivity, better regexp flavour...) Cheers, Paul |
Quote:
does: egrep 'foo|bar|baz' textfile work? |
Please don't tell me it's strange, let alone very strange! "Strange" seems to be my NMOO (normal mode of operation) with Unix. :p
This is what I get when I try egrep with the | and \|: Code:
% cat test.txt |
But that's *fine*! I think I see what's going on here: Doug backslashed his "|" symbols because he didn't quote the whole regular expression. That backslashing prevented the shell from thinking they were pipes (and thus chucking a fit about not being able to find the command on the right hand side of the first pipe, or maybe offering a strange substitution). Quoting your expression has the same effect: *egrep* gets to see the "|" symbols instead of the greedy old shell grabbing them in transit.
When you wrote above that your egrep worked *without quotes and without backslashes* it certainly was "interesting". But if you were quoting all the way, just invisibly in the post that's about 3 above this one, then the world is at peace. When you write % egrep 'ez\|x\|techhhh' test.txt you're asking for occurences of the literal text 'ez|x|techhhh'. In other words, you've "double-negativized" the "|" symbol, so that it's interpreted literally. I hope that's close, anyway. Seems to pass my tests! Cheers, Paul |
Ah, but that's not strange at all! (Strange is far from NMOD with Unix BTW).
Notice that stetner said: Code:
egrep foo\|bar\|baz textfileCode:
egrep 'foo\|bar\|baz' textfileCode:
egrep 'foo|bar|baz' textfile |
Ahhhh! Okay, the light just switched on. :) I didn't even notice the absence of the quotes in stetner's message. I just did a copy/paste from his message when writing mine so the absent quotes copied/pasted right along with it. Using quotes is just second nature to me (I've always used quotes when allowed, even if not required). I will be more watchful for the variations of quotes/absent quotes in the future.
Okay, mystery solved. Thanks! :) |
Vicki: I just have to confuse you a little bit more. If you had done your double quoting using grep instead of egrep, like:
Code:
grep 'foo\|bar\|baz' textfile |
greep is creepy. i prefer grap or grop, grup even.
|
Very mysterious and ooky, if you ask me. ;)
My only possible contribution to this thread: grep stands for General Regular Expression Parser, IIRC. Back to read and learn something mode. :cool: |
Quote:
Actually, I understood what you said. But I had no idea you could do that with grep (multiple search)! I thought you had to use egrep for a multiple search! I learn so many things from you guys! :) |
Vicki: With GNU grep there's actually no difference between greps and egreps regular expression engines. It's just the parsing of those "extended" features that are different. With grep you have to put a backslash in front of those operating characters to get them to act special, with egrep you don't. With egrep you have to put a backslash in front of those characters to switch their special meaning off, with grep you don't.
Craig: According to "Master Regular Expressions" by Jeffrey Friedl grep got it's name from a common operation in the ed editor: :g/Regular Expression/p Which can be read as Global Regular Expression Print and it was so poular that a standalone utility, grep, was created for it. I don't know what sources Friedl has for this particular info, but I find it more plausible than "General Regular Expression Parser". Because grep does so much more than just parse the regexp and even so, what's "general" about grep? And my suggestions for names to Paul's "egrep -i" alias are: griffin or grip. The last one is the Swedish word for griffin (which is a mythical creature combining the bodies of three animals). In Swedish grip is pronounced greep. :) |
kudos
not adding any tech info, but just wanted to extend a kudos to vicki for the investigative attitude that will turn any newbie into a guru in no time (well, actually, lots of time but eventually people starting thinking you can do magic [not that i would know personally])
if only we could replace all the "MS Word in 21 days" books/classes with a "How to approach computers" or "How to approach computer software" books/classes, geeks might actually start shutting up about "lusers" of the world. |
You just made my day! Thank you! :)
|
Quote:
Code:
% pwd |
Quote:
:g/reg exp/p to print out all lines with the reg exp in it. I will just go back to cleaning my dentures now..... :) |
My book says it's global...
Quote:
|
|
Code:
grepCheers... |
Quote:
Quote:
Cheers, Paul |
Quote:
|
| All times are GMT -5. The time now is 10:35 PM. |
Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
Site design © IDG Consumer & SMB; individuals retain copyright of their postings
but consent to the possible use of their material in other areas of IDG Consumer & SMB.