0

Is there any easy way to extract this URL in bash/or PHP?

http://shop.image-site.com/images/2/format2013/fullies/kju_product.png

From this HTML code?

<a href="javascript: open_window_zoom('http://shop.image-site.com/image.php?image=http://shop.image-site.com/images/2/format2013/fullies/kju_product.png&pID=31777&download=kju.png&name=13011 KELLYS Kju: 490mm (19.5&quot;)',550,366);">

4 Answers 4

2

With perl you could do a match and a capture

perl -n -e 'print "$1\n" if (m/image=(.*?)\&/);'

This captures everything between image= and the next & and prints it $1.

For more on regular expressions, see perlre or http://www.regular-expressions.info/

Sign up to request clarification or add additional context in comments.

3 Comments

You guys rock! Works like a charm. This is regex? It is seems easier. Sometimes I need regex, but it is really hard to learn. :)
@Adrian It's a skill well worth learning. Start with simple regular expressions and expand on that.
To second what Olaf said, it's one of the most powerful tools a programmer has.
2

In bash, you can try the following:

sed 's/.*image=\(http:\/\/[^&]*\).*/\1/g'

Update:
The solution above performs substitution rather than extraction. The line containing the pattern (required url) is replaced by the pattern itself. However, the substitution isn't in-place.

5 Comments

Do you really need to match the beginning of the line and the end of the line?
@L0j1k I didn't understand what do you mean by matching beginning of line and end of line. I didn't used ^ or $ in my solution.
Aloha. That's exactly right. And if you're going to use a substitution match (which will destroy the original data, something the asker may not know about), you should be using ^ and $. It all comes down to greedy matching, like sputnick said.
@L0j1k Ok. Now, I understand what you meant.
Since you know what's up now, I want to undownvote this, but I can't unless you edit your answer. If you edit your answer to include a disclaimer that your answer will perform a substitution (and therefore destroy the original data), then I'll upvote you.
1

Whichever way you decide to dress it up, you could simply split with the delimiter equal to ?image= and then split the second token you receive (i.e. result[1]) with a simple & delimiter. The first result from that split is your answer.

However, a pure regex match would look something like: m#image=(a-z0-9\:/\.\-)&#i. You can take that regex and put it wherever you want to get your result stored in $1. Despite what a lot of people think, you do not have to match the beginning of a line and the end of a line to match a result.

Comments

1

Try doing this :

xmllint --html --xpath '//a/@href' file://file.html |
    grep -oP 'image=\Khttp://.*?\.png'

You can use an URL instead of a local file :

http://domain.tld/path

Or if you had already extracted the line to parse in the $string variable :

grep -oP 'image=\Khttp://.*?\.png' <<< "$string"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.