2

I have this link which i declare like this:

link = "<a href=\"https://www.congress.gov/bill/93rd-congress/house-bill/11461\">H.R.11461</a>"

The question is how could I use regex to extract only the href value?

Thanks!

3 Answers 3

8

If you want to parse HTML, you can use the Nokogiri gem instead of using regular expressions. It's much easier.

Example:

require "nokogiri"

link = "<a href=\"https://www.congress.gov/bill/93rd-congress/house-bill/11461\">H.R.11461</a>"

link_data = Nokogiri::HTML(link)

href_value = link_data.at_css("a")[:href]

puts href_value # => https://www.congress.gov/bill/93rd-congress/house-bill/11461
Sign up to request clarification or add additional context in comments.

Comments

7

You should be able to use a regular expression like this:

href\s*=\s*"([^"]*)"

See this Rubular example of that expression.

The capture group will give you the URL, e.g.:

link = "<a href=\"https://www.congress.gov/bill/93rd-congress/house-bill/11461\">H.R.11461</a>"
match = /href\s*=\s*"([^"]*)"/.match(link)
if match
  url = match[1]
end

Explanation of the expression:

  • href matches the href attribute
  • \s* matches 0 or more whitespace characters (this is optional -- you only need it if the HTML might not be in canonical form).
  • = matches the equal sign
  • \s* again allows for optional whitespace
  • " matches the opening quote of the href URL
  • ( begins a capture group for extraction of whatever is matched within
  • [^"]* matches 0 or more non-quote characters. Since quotes inside HTML attributes must be escaped this will match all characters up to the end of the URL.
  • ) ends the capture group
  • " matches the closing quote of the href attribute's value

Comments

1

In order to capture just the url you can do this:

/(href\s*\=\s*\\\")(.*)(?=\\)/

And use the second match.

http://rubular.com/r/qcqyPv3Ww3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.