0

I have a URL string:

http://localhost:3000/user/event?profile_id=2&profile_type=UserProfile

I want to extract "2" and "UserProfile", where these can change.

I tried to use both match and scan but neither is returning results:

url = "http://localhost:3000/user/event?profile_id=2&profile_type=UserProfile"
m = /http(s)?:\/\/(.)+\/user\/event?profile_id=(\d)&profile_type=(\w)/.match(url)
=> nil 

url.scan /http(s)?:\/\/(.)+\/user\/event?profile_id=(\d)&profile_type=(\w)/
=> [] 

Any idea what I might be doing wrong?

2
  • In your specific regular expression, you need to escape the ? between event and profile_id, so it's event\?profile_id and you'll have an actual MatchData object. Now you need to remove the parentheses from (s) and (.) and add + to the (\w) so it's (\w+) and you'll get your desired results. Commented Oct 3, 2014 at 20:02
  • Don't use a regex for this. They're too fragile to handle URLs which can change order. Commented Oct 3, 2014 at 20:19

2 Answers 2

2

Don't use a pattern to try to do this. URL ordering of the query parameters can change, and isn't position dependent, which would instantly break a pattern.

Instead, use a tool designed for the purpose, like the built-in URI:

require 'uri'

uri = URI.parse('http://localhost:3000/user/event?profile_id=2&profile_type=UserProfile')

Hash[URI::decode_www_form(uri.query)].values_at('profile_id', 'profile_type') 
# => ["2", "UserProfile"]

By doing it that way you are guaranteed to always receive the right value in the expected order, making it easy to assign them:

profile_id, profile_type = Hash[URI::decode_www_form(uri.query)].values_at('profile_id', 'profile_type')

Here are the intermediate steps so you can see what's happening:

uri.query # => "profile_id=2&profile_type=UserProfile"
URI::decode_www_form(uri.query) # => [["profile_id", "2"], ["profile_type", "UserProfile"]]
Hash[URI::decode_www_form(uri.query)] # => {"profile_id"=>"2", "profile_type"=>"UserProfile"}
Sign up to request clarification or add additional context in comments.

Comments

1
match = url.match(/https?:\/\/.+?\/user\/event\?profile_id=(\d)&profile_type=(\w+)/)
p match.captures[0] #=> '2'
p match.captures[1] #=> 'UserProfile'

In your expression:

/http(s)?:\/\/(.)+\/user\/event?profile_id=(\d)&profile_type=(\w)/

EVERYTHING you put inside () is captured in a regular expression. There's no need to put the s in parentheses because ? will act only on the preceding character. Also, there's no need for the (.) because, again, the + will act only on the preceding character. Also, (\w) should be (\w+) which basically says: One or more characters (and 'UserProfile' is 1 or more characters.

3 Comments

So what did I miss just the second non-capturing group?
"EVERYTHING you put inside () is captured in Ruby". That's not Ruby specific, it's true of regex in general. As soon as you see (?...) the parenthesis become non-capturing.
I liked the other answer better but I still give this one +1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.