Ruby regex: extract a list of urls from a string

Question

I have a string of images' URLs and I need to convert it into an array.

How do I do this?

Why does it have to use a regular expression?

Andrew Grimm
– Andrew Grimm

2011-08-16 22:16:33 +00:00
Commented Aug 16, 2011 at 22:16 — Andrew Grimm
– Andrew Grimm, Commented Aug 16, 2011 at 22:16

dogenpunk · Accepted Answer · 2011-08-16 17:51:43Z

5

URI.extract(your_string)

That's all you need if you already have it in a string. I can't remember, but you may have to put require 'uri' in there first. Gotta love that standard library!

Here's the link to the docs URI#extract

answered Aug 16, 2011 at 17:51

dogenpunk

4,3921 gold badge23 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Marnen Laibow-Koser Over a year ago

Nice! Didn't know about that.

stema · Accepted Answer · 2011-08-16 13:56:42Z

4

Scan returns an array

myarray = mystring.scan(/regex/)

See here on regular-expressions.info

answered Aug 16, 2011 at 13:56

stema

93.5k20 gold badges110 silver badges138 bronze badges

Comments

helpwithhaskell · Accepted Answer · 2011-08-16 18:08:35Z

1

The best answer will depend very much on exactly what input string you expect.

If your test string is accurate then I would not use a regex, do this instead (as suggested by Marnen Laibow-Koser):

mystring.split('?v=3')

If you really don't have constant fluff between your useful strings then regex might be better. Your regex is greedy. This will get you part way:

mystring.scan(/https?:\/\/[\w.-\/]*?\.(jpe?g|gif|png)/)

Note the '?' after the '*' in the part capturing the server and path pieces of the URL, this makes the regex non-greedy.

The problem with this is that if your server name or path contains any of .jpg, .jpeg, .gif or .png then the result will be wrong in that instance.

Figuring out what is best needs more information about your input string. You might for example find it better to pattern match the fluff between your desired URLs.

answered Aug 16, 2011 at 18:08

helpwithhaskell

5585 silver badges13 bronze badges

2 Comments

stema Over a year ago

@km be careful with the character class [\w.-\/] you are creating a range from . to /. Here its not the problem that you include characters you don't want, because the slash is following directly after the dot in the ascii table. BUT the - is here a range operator and not the character. This class will not match the - literally. You have to escape the minus or put it to the start or the end like this [\w.\/-]

Marnen Laibow-Koser Over a year ago

You've taken my suggestion wrong. I was assuming split would be used with a regex.

Marnen Laibow-Koser · Accepted Answer · 2011-08-16 13:50:19Z

1

Use String#split (see the docs for details).

answered Aug 16, 2011 at 13:50

Marnen Laibow-Koser

6,3551 gold badge31 silver badges34 bronze badges

Comments

SamuraiJack · Accepted Answer · 2011-08-16 15:45:26Z

-1

Part of the problem is in rubular you are using https instead of http.. this gets you closer to what you want if the other answers don't work for you:

http://rubular.com/r/cIjmjxIfz5

answered Aug 16, 2011 at 15:45

SamuraiJack

505 bronze badges

1 Comment

stema Over a year ago

Have you an idea, what the expression https? is doing? The ? makes the s optional, so https? will match http and https

Collectives™ on Stack Overflow

Ruby regex: extract a list of urls from a string

5 Answers 5

1 Comment

Comments

2 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related