2

I am trying to find the proper regexes to use with the grep command on the file text.txt.

Question

  1. Find all occurrences of words in text that have a substring ad, bd, cd, dd, ed.

  2. Find all occurrences of numbers > 100

  3. Find all occurrences of numbers > 100 that contain a digit 0 or 5

My Approach

  1. grep -io '[a-e]*d' text

    Prints words with the proper substrings, but doesn’t print the whole string/word.

    ad
    d
    d
    ed
    d
    d
    ed
    d
    d
    d
    d
    ed
    d
    d
    
  2. grep -io '[199][1-9]*' text

    I believe I am way off on the regex, but it still prints the correct result.

    1973
    197
    17775
    
  3. grep -io '[05][1-9]*' text

    This is the continuation of 2., so I don’t understand the 2. part in 3., but I believe I have the string containing a digit 0 or 5 correct.

    0
    0
    0
    5
    
8
  • -o only prints the bit that matched so if you need more than the pattern you have you need to extend it to match the words/etc. you need. Commented Mar 20, 2015 at 20:46
  • Yes, that's my question to a) I know i have to add [a-zA-Z] somewhere but not sure in what position Commented Mar 20, 2015 at 20:49
  • Regexes are for finding text patterns, not for determining numeric values. You'll want to use a tool like awk to evaluate numerics. Commented Mar 20, 2015 at 20:50
  • Thanks for mentioning that Andy, I will review awk command soon Commented Mar 20, 2015 at 20:52
  • 1
    @AndyLester, you can match digit patterns with regular expressions, and thereby approach parts (b) and (c) with regular expressions. Indeed, part (c) can probably be written more clearly and succinctly with regex than with arithmetic. Commented Mar 20, 2015 at 21:00

2 Answers 2

1

A) Find all occurrences of words in text that have a substring ad, bd, cd, dd, ed.

grep -ow '.*\(a\|b\|c\|d\|e\)d.*' text

or

egrep -ow '.*(a|b|c|d|e)d.*' text

B) Find all occurrences of numbers > 100

grep -ow '[1-9][0-9][0-9]\+' text

C) Find all occurrences of numbers > 100 that contain a digit 0 or 5

grep -ow '[1-9][0-9][0-9]\+' text | grep '\(0\|5\)'

or

grep -ow '[1-9][0-9][0-9]\+' text | egrep '(0|5)'

I'm using the option -o to output every match on it's own line and not the whole line where the pattern was found and the option -w that specifies that before and after the match should be a word boundary.

Sign up to request clarification or add additional context in comments.

3 Comments

Hi hek, b) and c) work and are correct based on my understanding. But when using a) i think it is wrong because it prints a paragraph where some words don't match a)'s description
I'll let you know because I'm still going to review it in more detail for now I am going to take a break and come back later.
I would also like to break, probably until tomorrow. Its late here. But don't hesitate to ask, I'll answer tomorrow.
1

For part (a), the -o option to grep causes it to print only the part of the line that matches the pattern, but your pattern does not match whole words. You simply need to adjust your pattern to match the parts of each word before and after the [a-e]d substring.

For part (b), your pattern is all wrong. It will not match the numbers 299 or 1000, for instance. The digit pattern you want is a digit between 1 and 9 followed by at least two digits between 0 and 9.

Part (c) is the trickiest. You must match digit patterns containing at least three digits, the first being between 1 and 9, with either a 5 in the first position or a 0 or 5 in any other position. You probably need to separate that into alternatives with the | operator. It looks like you probably need three: the case where the lead digit is 5; the case where the second digit is either 0 or 5, and the case where some later digit is 0 or 5. In the third case you mustn't forget that there may be any number of additional digits, including zero, on either side of the 0 or 5 you match.

7 Comments

for a) would the regex be '[a-zA-Z][a-e]d[a-zA-Z]'?
for b) would the regex be '[1-9][0-9]{2,}' ?
@darere you're going in the right direction, but you're not there yet. you need to match any number of letters on either side of the [a-e]d. Also, although it's not wrong to include capital letters in your character classes, if you're using grep -i then you don't need them.
@darere, with respect to part (b), yes, that's just what I would do. Note, though, that grep requires you to escape the { and } by preceding them with backslashes.
so b) is '[1-9][0-9]\{2,}\' ? and a) would be '[a-z][0-9][a-e]d[a-z][0-9]' ? Also for a) does it matter in which position [a-z] or [0-9] is placed such as [0-9] is first position instead?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.