Finding strings in a file using regex and grep in Linux

Question

I am trying to find the proper regexes to use with the grep command on the file text.txt.

Question

Find all occurrences of words in text that have a substring ad, bd, cd, dd, ed.
Find all occurrences of numbers > 100
Find all occurrences of numbers > 100 that contain a digit 0 or 5

My Approach

grep -io '[a-e]*d' text

Prints words with the proper substrings, but doesn’t print the whole string/word.
```
ad
d
d
ed
d
d
ed
d
d
d
d
ed
d
d
```
grep -io '[199][1-9]*' text

I believe I am way off on the regex, but it still prints the correct result.
```
1973
197
17775
```
grep -io '[05][1-9]*' text

This is the continuation of 2., so I don’t understand the 2. part in 3., but I believe I have the string containing a digit 0 or 5 correct.
```
0
0
0
5
```

-o only prints the bit that matched so if you need more than the pattern you have you need to extend it to match the words/etc. you need. — Etan Reisner
– Etan Reisner, Commented Mar 20, 2015 at 20:46
Yes, that's my question to a) I know i have to add [a-zA-Z] somewhere but not sure in what position — geforce
– geforce, Commented Mar 20, 2015 at 20:49
Regexes are for finding text patterns, not for determining numeric values. You'll want to use a tool like awk to evaluate numerics. — Andy Lester
– Andy Lester, Commented Mar 20, 2015 at 20:50
Thanks for mentioning that Andy, I will review awk command soon — geforce
– geforce, Commented Mar 20, 2015 at 20:52
@AndyLester, you can match digit patterns with regular expressions, and thereby approach parts (b) and (c) with regular expressions. Indeed, part (c) can probably be written more clearly and succinctly with regex than with arithmetic. — John Bollinger
– John Bollinger, Commented Mar 20, 2015 at 21:00

hek2mgl · Accepted Answer · 2015-03-20 21:20:17Z

1

A) Find all occurrences of words in text that have a substring ad, bd, cd, dd, ed.

grep -ow '.*\(a\|b\|c\|d\|e\)d.*' text

or

egrep -ow '.*(a|b|c|d|e)d.*' text

B) Find all occurrences of numbers > 100

grep -ow '[1-9][0-9][0-9]\+' text

C) Find all occurrences of numbers > 100 that contain a digit 0 or 5

grep -ow '[1-9][0-9][0-9]\+' text | grep '\(0\|5\)'

or

grep -ow '[1-9][0-9][0-9]\+' text | egrep '(0|5)'

I'm using the option -o to output every match on it's own line and not the whole line where the pattern was found and the option -w that specifies that before and after the match should be a word boundary.

edited Mar 20, 2015 at 21:20

answered Mar 20, 2015 at 21:15

hek2mgl

159k31 gold badges263 silver badges279 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

geforce Over a year ago

Hi hek, b) and c) work and are correct based on my understanding. But when using a) i think it is wrong because it prints a paragraph where some words don't match a)'s description

geforce Over a year ago

I'll let you know because I'm still going to review it in more detail for now I am going to take a break and come back later.

hek2mgl Over a year ago

I would also like to break, probably until tomorrow. Its late here. But don't hesitate to ask, I'll answer tomorrow.

John Bollinger · Accepted Answer · 2015-03-20 21:09:28Z

1

For part (a), the -o option to grep causes it to print only the part of the line that matches the pattern, but your pattern does not match whole words. You simply need to adjust your pattern to match the parts of each word before and after the [a-e]d substring.

For part (b), your pattern is all wrong. It will not match the numbers 299 or 1000, for instance. The digit pattern you want is a digit between 1 and 9 followed by at least two digits between 0 and 9.

Part (c) is the trickiest. You must match digit patterns containing at least three digits, the first being between 1 and 9, with either a 5 in the first position or a 0 or 5 in any other position. You probably need to separate that into alternatives with the | operator. It looks like you probably need three: the case where the lead digit is 5; the case where the second digit is either 0 or 5, and the case where some later digit is 0 or 5. In the third case you mustn't forget that there may be any number of additional digits, including zero, on either side of the 0 or 5 you match.

edited Mar 20, 2015 at 21:09

answered Mar 20, 2015 at 20:56

John Bollinger

191k11 gold badges103 silver badges206 bronze badges

7 Comments

geforce Over a year ago

for a) would the regex be '[a-zA-Z][a-e]d[a-zA-Z]'?

geforce Over a year ago

for b) would the regex be '[1-9][0-9]{2,}' ?

John Bollinger Over a year ago

@darere you're going in the right direction, but you're not there yet. you need to match any number of letters on either side of the [a-e]d. Also, although it's not wrong to include capital letters in your character classes, if you're using grep -i then you don't need them.

John Bollinger Over a year ago

@darere, with respect to part (b), yes, that's just what I would do. Note, though, that grep requires you to escape the { and } by preceding them with backslashes.

geforce Over a year ago

so b) is '[1-9][0-9]\{2,}\' ? and a) would be '[a-z][0-9][a-e]d[a-z][0-9]' ? Also for a) does it matter in which position [a-z] or [0-9] is placed such as [0-9] is first position instead?

|

Collectives™ on Stack Overflow

Finding strings in a file using regex and grep in Linux

2 Answers 2

3 Comments

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related