0

I've the following string. How can I extract out the "somesite.com/2009/10/monit-on-ubuntu/" part from it using ruby regular expression?

http://linkto.com/to/1pyTZl/somesite.com/2009/10/monit-on-ubuntu/t

The common is, starts with "/to/some-alpha-num" and always ends with "/t"

6 Answers 6

6

That string looks like it's actually not a string but a URI. So, let's treat it as one:

require 'uri'
uri = URI.parse(str)

Now, extracting the path component of the URI is a piece of cake:

path = uri.path

Now we have already greatly limited the amount of stuff that can go wrong with our own parsing. The only part of the URI we still have to deal with, is the path component.

A Regexp that matches the part you are interested in looks like this:

%r|/to/\w+/(.*/)t$|i

If we put all of that together, we end up with something like this:

require 'uri'

def URI.extract(uri)
  return parse(uri).path[%r|/to/\w+/(.*/)t$|i, 1]
end

require 'test/unit'
class TestUriExtract < Test::Unit::TestCase
  def test_that_the_path_gets_extracted_correctly
    uri  = 'http://linkto.com/to/1pyTZl/somesite.com/2009/10/monit-on-ubuntu/t'
    path = 'somesite.com/2009/10/monit-on-ubuntu/'
    assert_equal path, URI.extract(uri)
  end
end
Sign up to request clarification or add additional context in comments.

Comments

2

//to/\w+/(.*)/t/i

A great resource is Rubular. It allows you to test your expression against inputs and see the matches.

1 Comment

suffers from leaning toothpick syndrome. use %r to choose different delimiters
2

Answers so far a right, but you should make sure the trailing /t is really at the end of the string using the $ wildcard

regex = %r(/to/[^/]+/(.*)/t$)
'http://linkto.com/to/1pyTZl/somesite.com/2009/10/monit-on-ubuntu/t' =~ regex
puts $1

Comments

0

Maybe with /\/to\/[^\/]*\/(.*)\/t/ :

"http://linkto.com/to/1pyTZl/somesite.com/2009/10/monit-on-ubuntu/t" =~ /\/to\/[^\/]*\/(.*)\/t/
puts $1

-> somesite.com/2009/10/monit-on-ubuntu

Comments

0
/to/\w+/(.*?)/t

Comments

0
s = "http://linkto.com/to/1pyTZl/somesite.com/2009/10/monit-on-ubuntu/t"
puts s[/to\/.+?\/(.*)\/t$/, 1]
=> somesite.com/2009/10/monit-on-ubuntu

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.