2

I have already extracted the tag from the source document using grep but, now I cant seem to figure out how to easily extract the properties from the string. Also I want to avoid having to use any programs that would not usually be present on a standard installation.

$tag='<img src="http://imgs.xkcd.com/comics/barrel_cropped_(1).jpg" title="Don't we all." alt="Barrel - Part 1" />'  

I need to end up with the following variables

$src="http://imgs.xkcd.com/comics/barrel_cropped_(1).jpg"
$title="Don't we all."
$alt="Barrel - Part 1"

4 Answers 4

4

You can use xmlstarlet. Then, you don't even have to extract the element yourself:

$ echo $tag|xmlstarlet sel -t --value-of '//img/@src'
http://imgs.xkcd.com/comics/barrel_cropped_(1).jpg

You can even turn this into a function

$ get_attribute() {
  echo $1 | xmlstarlet sel -t -o "&quot;" -v $2 -o "&quot;"
  }
$ src=get_attribute $tag '//img/@src'

If you don't want to reparse the document several times, you can also do:

$ get_values() {
   eval file=\${$#}
   eval $#=    
   cmd="xmlstarlet sel "
   for arg in $@
   do
      if [ -n $arg ]
      then
        var=${arg%%\=*}
        expr=${arg#*=}
        cmd+=" -t -o \"$var=&quot;\" -v $expr -o \"&quot;\" -n"
      fi
   done
   eval $cmd $file
  }
$ eval $(get_values src='//img/@src' title='//img/@title' your_file.xml)
$ echo $src
http://imgs.xkcd.com/comics/barrel_cropped_(1).jpg
$ echo $title
Don't we all.

I'm sure there's a better way to remove the last argument to a shell function, but I don't know it.

Sign up to request clarification or add additional context in comments.

1 Comment

Oh, then xmlstarlet might not be available on a standard installation. Sorry, I think it was a little too late when I wrote the answer...
1

I went with dacracot's suggestion of using sed although I would have prefered if he had given me some sample code

src=`echo $tag | sed 's/.*src=["]\(.*\)["] title=["]\(.*\)["] alt=["]\(.*\)["].*/\1/'`    
title=`echo $tag | sed 's/.*src=["]\(.*\)["] title=["]\(.*\)["] alt=["]\(.*\)["].*/\2/'`  
alt=`echo $tag | sed 's/.*src=["]\(.*\)["] title=["]\(.*\)["] alt=["]\(.*\)["].*/\3/'`

4 Comments

Using sed is a really, really bad approach -- it's brittle and doesn't know anything at all about the XML standard, and so will give you bad results when encountering things like &amp. See Torsten Marek's suggestion.
sorry i didnt work out the sed script for you, i didnt have time right then
If you don't have time to write a good answer, then don't write one. Even if you do, be sure to come back and edit it later.
What is your definition of good? I find it very amusing that I have my answer is selected with -1 votes. No I didn't code it for him, but I sent him in the right direction to find the answer. Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime.
0

If xmlstarlet is available on a standard installation and the sequence of src-title-alt does not change, you can use the following code as well:

tag='<img src="http://imgs.xkcd.com/comics/barrel_cropped_(1).jpg" title="Don'"'"'t we all." alt="Barrel - Part 1" />'
xmlstarlet sel -T -t -m "/img" -m "@*" -v '.' -n <<< "$tag"
IFS=$'\n'
array=( $(xmlstarlet sel -T -t -m "/img" -m "@*" -v '.' -n <<< "$tag") )
src="${array[0]}"
title="${array[1]}"
alt="${array[2]}"

printf "%s\n" "src: $src" "title: $title" "alt: $alt"

Comments

0

Since this bubbled up again, there is now my Xidel that has 2 features which make this task trivial:

  • pattern matching on the xml

  • exporting all matched variables to the shell

So it becomes a single line:

eval $(xidel "$tag" -e '<img src="{$src}" title="{$title}" alt="{$alt}"/>' --output-format bash)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.