1

I'm trying to search a field in a database to extract URLs. Sometimes there will be more than 1 URL in a field and I would like to extract those in to separate variables (or an array).

I know my regex isn't going to cover all possibilities. As long as I flag on anything that starts with http and ends with a space I'm ok.

The problem I'm having is that my efforts either seem to get only 1 URL per record or they get only 1 the last letter from each URL. I've tried a couple different techniques based on solutions other have posted but I haven't found a solution that works for me.

Sample input line: Testing http://marko.co http://tester.net Just about anything else you'd like.

Output goal $var[0] = http://marko.co $var[1] = http://tester.net

First try: if ( $status =~ m/http:(\S)+/g ) { print "$&\n"; }

Output: http://marko.co

Second try: @statusurls = ($status =~ m/http:(\S)+/g); print "@statusurls\n";

Output: o t

I'm new to regex, but since I'm using the same regex for each attempt, I don't understand why it's returning such different results.

Thanks for any help you can offer.

I've looked at these posts and either didn't find what I was looking for or didn't understand how to implement it:

This one seemed the most promising (and it's where I got the 2nd attempt from, but it didn't return the whole URL, just the letter: How can I store regex captures in an array in Perl?

This has some great stuff in it. I'm curious if I need to look at the URL as a word since it's bookended by spaces: Regex Group in Perl: how to capture elements into array from regex group that matches unknown number of/multiple/variable occurrences from a string?

This one offers similar suggestions as the first two. How can I store captures from a Perl regular expression into separate variables?

Solution: @statusurls = ($status =~ m/(http:\S+)/g); print "@statusurls\n";

Thanks!

2
  • Also - don't use $& - see the WARNING in the perlre docs Commented Aug 4, 2011 at 13:49
  • @urls = $status =~ m{ *(https*://[^ ]+) *}g; Commented Aug 4, 2011 at 15:00

1 Answer 1

3

I think that you need to capture more than just one character. Try this regex instead:

m/http:(\S+)/g
Sign up to request clarification or add additional context in comments.

1 Comment

Your answer was the push I needed in the right direction. I used what you posted and moved the ( to the left of the http because I needed that in the result. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.