0

I am parsing an xml file using xmllint. Theres an element description in each <item> with CDATA text inside from which I would like to extract the title (text until forst <br />) and the URL of a specific domain (desiredURL.com). I am not a pro in regeular expression and the use of awk and sed. Is there a way to parse the data in the description element using xmllint again or what would be an appropriate approach? I want to iterate over all the <item> and print the title and the url of the domain desiredURL.com

#!/bin/bash
ITEMS=`echo "cat  //item/description/text()" | xmllint --shell  file.xml  | egrep '^\w'`
#iterate over items and print title and desiredURL


file.xml:

<item>
    <description><![CDATA[A title for the URLs<br /><br />

    http://www.foobar.com/foo/bar
    <br />http://bar.com/foo
    <br />http://myurl.com/foo
    <br />http://desiredURL.com/files/ddd
    <br />http://asdasd.com/onefile/g.html
    <br />http://second.com/link
    <br />]]></description> 



    </item>
<description> ...</description>
    <item>
</item>

1 Answer 1

1

XMLlint

There is an --xpath option you can use to pass an XPath.

Extracting URL

Assuming your URLs are not followed by anything on each line, you can use grep with :

  • -P flag: Perl regular expression (PCRE) ;
  • -o flag: only print the matched (non-empty) parts.

Command

xmllint --xpath '//item/description' /tmp/so.xml | grep -Po 'http:.*' 
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! Is there a way to get all <item> elements into an array and iterate over the arrays? then, for each array, iterate over the lines of the <description> element?
@tzippy Yes it possible. However you shoudl valid this answer, and ask another question as it's a different problem.
@Eduard Lopez: I did, thanks: stackoverflow.com/questions/20495885/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.