using Regex and linux commands(grep or egrep?) to find specific strings

Question

Note: I am not sure that my regex's are correct since my textbook at school does not explain/teach regex's of this form but only of the math form such as for DFA's/NFA

I would appreciate any suggestions or hints

Question:

(a) find all occurrences of three letter words in text that begin with `a' and end with 'e';

(b) find all occurrences of words in text that begin with `m' and end with 'r';

My Approach:

a) ^[a][a-zA-Z][e]$ (how to distinguish between 3 letter words and all words?)

b) ^[m][a-zA-Z][r]$

Also I want to use these regex's in linux so would the following command work?:

grep '^[a][a-zA-Z][e]$' 'usr/dir/.../text.txt'

or should I use egrep in this way:

find . -text "*.txt" -print0 | xargs -0 egrep '^[a][a-zA-Z][e]$'

anubhava · Accepted Answer · 2015-03-20 19:25:01Z

3

You can use grep -w with an alternation of regex for both the matches:

grep -w 'a[a-zA-Z]e\|m[a-zA-Z]*r' file.txt

answered Mar 20, 2015 at 19:25

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hek2mgl Over a year ago

Never knew about -w. Nice!

hek2mgl · Accepted Answer · 2015-03-20 19:33:19Z

1

You can use the word boundary \b to match the start and the end of a word:

a) find all occurrences of three letter words in text that begin with `a' and end with 'e';

grep -o '\ba[a-zA-Z]e\b'

The pattern matches a word boundary, then a following a, a single character and a following e and a word boundary.

b) find all occurrences of words in text that begin with `m' and end with 'r';

grep -o '\bm[a-zA-Z]*r\b'

The pattern matches a word boundary, an m zero ore more characters (thorugh the * quantifier), an r and a word boundary again.

Further I'm using the options -o which outputs every match on its own line rather than outputting the whole line of input which contains a match.

Btw, thanks to the option -w - matching only whole words - you can even simplify the above patterns to:

a)

grep -wo 'a[a-zA-Z]e'

and b)

grep -wo 'm[a-zA-Z]*r'

Thanks to @anubhava!

You asked for egrep. egrep can't help to simplify or optimize the patterns. grep is absolutely fine.

edited Mar 20, 2015 at 19:33

answered Mar 20, 2015 at 19:19

hek2mgl

159k31 gold badges263 silver badges279 bronze badges

8 Comments

geforce Over a year ago

How would I be able to use this on the file I want? The file is text.txt so should it be grep -oi '\ba[a-z]e\b' /cs/dept/course/2014-15/W/201/text.txt ?

geforce Over a year ago

Ok, thats odd when i use the command "grep -oi '\ba[a-z]e\b' text " it doesn't throw any errors but when i enter it nothing happens.

hek2mgl Over a year ago

The first pattern is simply not found in the text. The second pattern finds 3 occurrences of microcomputer. Add the word are to the text and the first command will find it.

geforce Over a year ago

Just checked that now, looks like it's working thanks for the help. Appreciate other people posting answers too.

anubhava Over a year ago

Well explained answer +1

|

Community · Accepted Answer · 2017-05-23 12:12:56Z

0

In your examples, you're only going to match full lines with three characters, matching the letters you expect.

The '^' indicates the beginning of the line

The '$' indicates the end of the line

In order to pull out only three letter words you're going to have to match on some whitespace. For instance grep ' a[a-Z]e ' 'usr/dir/.../text.txt'

however this will miss all instances of three letter words at the beginning or end of your line

here is an issue using egrep and grep to match whitespace/start of line

edited May 23, 2017 at 12:12

CommunityBot

11 silver badge

answered Mar 20, 2015 at 19:20

Brian Hewson

561 silver badge4 bronze badges

Comments

Grzegorz Żur · Accepted Answer · 2015-03-20 19:27:37Z

0

First of all, egrep is extended grep and is the same as calling grep with option -E. Secondly, you don't need to use find and xargs in many cases as there is -r option that will search recursively in files within specified path.

Your regular expression fits basic (not extended) regular expression language supported by grep, therefore egrep is not needed.

I would simplify this to

grep -r '^a[a-zA-Z]e$' /usr/share/dict/

and this

grep -r '^m[a-zA-Z]*r$' /usr/share/dict/

edited Mar 20, 2015 at 19:27

answered Mar 20, 2015 at 19:19

Grzegorz Żur

49.5k17 gold badges122 silver badges112 bronze badges

Collectives™ on Stack Overflow

using Regex and linux commands(grep or egrep?) to find specific strings

4 Answers 4

1 Comment

8 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

8 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related