how to grep part of the content from a string in bash

Question

For example when filtering html file, if every line is in this kind of pattern:

<a href="xxxxxx" style="xxxx"><i>some text</i></a>

how can I get the content of href, and how can I get the text between <i> and </i>?

@Ignacio Vazquez-Abrams: Does xmlstarlet work with HTML too? — Gumbo
– Gumbo, Commented Dec 21, 2010 at 5:32
@Gumbo: You'd have to shove it through HTML Tidy first, but that's not too big a deal. And it's more a matter of the option not existing, not the underlying libraries being unable to handle it. — Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams, Commented Dec 21, 2010 at 5:33

Community · Accepted Answer · 2017-05-23 09:58:13Z

1

cat file | cut -f2 -d\"

FYI: Just about every other HTML/regexp post on Stackoverflow explains why getting values from HTML using anything other than HTML parsing is a bad idea. You may want to read some of those. This one for example.

edited May 23, 2017 at 9:58

CommunityBot

11 silver badge

answered Dec 21, 2010 at 5:17

iteratingself

8,8041 gold badge32 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Raghuram · Accepted Answer · 2010-12-21 05:16:53Z

0

If href is always the second token separated by space in a,ine then u can try

grep "href" file | cut -d' ' -f2 | cut -d'=' -f2

answered Dec 21, 2010 at 5:16

Raghuram

3,9672 gold badges21 silver badges25 bronze badges

Comments

tommy · Accepted Answer · 2011-03-12 19:52:31Z

0

Here's how to do it using xmlstarlet (optionally with tidy):

# extract content of href and <i>...</i>
echo '<a href="xxxxxx" style="xxxx"><i>some text</i></a>' |
xmlstarlet sel -T -t -m "//a" -v @href -n -v i -n

# using tidy & xmlstarlet
echo '<a href="xxxxxx" style="xxxx"><i>some text</i></a>' |
tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null | 
xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:a" -v @href -n -v . -n

answered Mar 12, 2011 at 19:52

tommy

1

Collectives™ on Stack Overflow

how to grep part of the content from a string in bash

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related