0

I'm trying to learn bash scripting. As an exercise, I'm getting the Alt text and URL of the Google doodle.

I am stuck on using perl to parse out the link URL. I have it finding and outputting the alt text and url, but it is also outputting the whole webpage too. It does the same thing when I just put it in shell.

curl -s google.com --Location | perl -pe 's|.*<img.*alt="(.*?)".*src="(.*?)".*>.*|\1 http://google.com\2|'

How can I get this to stop outputting the webpage.

Note that I tried separating these to make sure it was perl doing to output of the page and not something with curl. It is definitely the perl part. If there is a better way to do this, let me know. The goal is to output the alt text and URL of the doodle.

0

1 Answer 1

2

This is an ugly way to do things, but it may work if you print each line from the web page where a substitution has been made

perl -ne 'print if s/<img.*alt="(.*?)".*src="(.*?)".*>/$1 http://google.com$2/'

But it would be cleaner to do just a regex match and use negated character classes instead of non-greedy quantifiers

perl -ne 'print "$1 http://google.com$2\n" if /<img[^<>]+alt="([^"]+)"[^<>]+src="([^"]+)"/'

But both of these rely on (amongst other things) all of the contents of the opening <img> tag appearing on a single line, which isn't necessarily true. They will also report the contents of every <img> element in the page that has both an alt and a src attribute.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.