2

Note: I am not sure that my regex's are correct since my textbook at school does not explain/teach regex's of this form but only of the math form such as for DFA's/NFA

I would appreciate any suggestions or hints

Question:

(a) find all occurrences of three letter words in text that begin with `a' and end with 'e';

(b) find all occurrences of words in text that begin with `m' and end with 'r';

My Approach:

a) ^[a][a-zA-Z][e]$ (how to distinguish between 3 letter words and all words?)

b) ^[m][a-zA-Z][r]$

Also I want to use these regex's in linux so would the following command work?:

grep '^[a][a-zA-Z][e]$' 'usr/dir/.../text.txt'

or should I use egrep in this way:

find . -text "*.txt" -print0 | xargs -0 egrep '^[a][a-zA-Z][e]$'

4 Answers 4

3

You can use grep -w with an alternation of regex for both the matches:

grep -w 'a[a-zA-Z]e\|m[a-zA-Z]*r' file.txt
Sign up to request clarification or add additional context in comments.

1 Comment

Never knew about -w. Nice!
1

You can use the word boundary \b to match the start and the end of a word:

a) find all occurrences of three letter words in text that begin with `a' and end with 'e';

grep -o '\ba[a-zA-Z]e\b'

The pattern matches a word boundary, then a following a, a single character and a following e and a word boundary.

b) find all occurrences of words in text that begin with `m' and end with 'r';

grep -o '\bm[a-zA-Z]*r\b'

The pattern matches a word boundary, an m zero ore more characters (thorugh the * quantifier), an r and a word boundary again.


Further I'm using the options -o which outputs every match on its own line rather than outputting the whole line of input which contains a match.


Btw, thanks to the option -w - matching only whole words - you can even simplify the above patterns to:

a)

grep -wo 'a[a-zA-Z]e'

and b)

grep -wo 'm[a-zA-Z]*r'

Thanks to @anubhava!


You asked for egrep. egrep can't help to simplify or optimize the patterns. grep is absolutely fine.

8 Comments

How would I be able to use this on the file I want? The file is text.txt so should it be grep -oi '\ba[a-z]e\b' /cs/dept/course/2014-15/W/201/text.txt ?
Ok, thats odd when i use the command "grep -oi '\ba[a-z]e\b' text " it doesn't throw any errors but when i enter it nothing happens.
The first pattern is simply not found in the text. The second pattern finds 3 occurrences of microcomputer. Add the word are to the text and the first command will find it.
Just checked that now, looks like it's working thanks for the help. Appreciate other people posting answers too.
Well explained answer +1
|
0

In your examples, you're only going to match full lines with three characters, matching the letters you expect.

The '^' indicates the beginning of the line

The '$' indicates the end of the line

In order to pull out only three letter words you're going to have to match on some whitespace. For instance grep ' a[a-Z]e ' 'usr/dir/.../text.txt'

however this will miss all instances of three letter words at the beginning or end of your line

here is an issue using egrep and grep to match whitespace/start of line

Comments

0

First of all, egrep is extended grep and is the same as calling grep with option -E. Secondly, you don't need to use find and xargs in many cases as there is -r option that will search recursively in files within specified path.

Your regular expression fits basic (not extended) regular expression language supported by grep, therefore egrep is not needed.

I would simplify this to

grep -r '^a[a-zA-Z]e$' /usr/share/dict/

and this

grep -r '^m[a-zA-Z]*r$' /usr/share/dict/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.