0

This is a kind of bizarre problem..

I have a list of words between html tags, each separated by a line, with some whitespace on the left, like so:

    <td>word</td>
    <td>anotherWord</td>
    ...

I want to extract the words from the list and not the tags, so I use:

temp=$(printf "%s" "$temp" | egrep '[....]')

Just to clarify, "temp" is the input to be searched. (I am doing this in a bash script, and I stored the input in variable temp). The "..." is a list of characters, since the words I'm trying to extract only use certain characters.

Whenever the grep finds a match, it outputs the word along with the html tags on either side! This only happens with a match, because I tested this by having the regex parameter be gibberish, like '09680876' - it had no matches in the temp file, and grep outputted nothing.

I also tried to use a specific word that I knew was a match as the regex parameter, like so:

.... | egrep 'hanai')

where I knew 'hanai' was a definite match in the sample text. This resulted in grep outputting

<td>hanai</td>

I am completely stumped and haven't been able to find solutions online. Would appreciate someone pointing out the obvious mistake I'm making.

1
  • 1
    egrep is deprecated. Use grep -E instead. Commented Apr 11, 2015 at 8:32

2 Answers 2

3

As per a related question here and use of extended and perl patterns in grep (and egrep). You would have to use a regular expression which returns only the matched group (the text of tag) sth like this (not tested):

grep -oP '<[a-zA-Z]+> \K\[^<]+' test.txt

What the regex above does is return the text of the tag only, as matched group and reject any open close tags

grep extended patterns

Sign up to request clarification or add additional context in comments.

Comments

2

By default grep (and egrep) outputs the lines containing the matched pattern. If you only want the matched pattern use the -o flag.

From man egrep:

-o, --only-matching
       Print  only  the  matched  (non-empty) parts of a matching line,
       with each such part on a separate output line.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.