-2

im trying to write small scraper script from google search, im write the program, bat have small problem i need regex for extract data-href value from google search, please help me :

exemple html code of google search :

data-href="www.buxmob.net/index.php?id=577">
data-href="www.webopedia.com/TERM/K/keyword.html">
data-href="moz.com/beginners-guide-to-seo/keyword-research">

need only the url present in this value, only this :

hxxp://www.webopedia.com/TERM/K/keyword.html
hxxp://moz.com/beginners-guide-to-seo/keyword-research
hxxp://www.buxmob.net/index.php?id=577

thanks you

7
  • 2
    Don't parse HTML with regexes. Use a proper parser (which make xpath's very yummy). Commented Feb 9, 2014 at 2:34
  • 1
    Scraping Google search results is against their TOS. You need to sign up for an API key and go about this the legitimate way. Commented Feb 9, 2014 at 2:35
  • im use ubotstudio, is not possible use external script ! Commented Feb 9, 2014 at 2:35
  • # Marc B google api gives me a few results Commented Feb 9, 2014 at 2:37
  • Obligatory reading: stackoverflow.com/questions/1732348/… Commented Feb 9, 2014 at 2:38

1 Answer 1

0

All the examples you gave can be matched with

(?:data-href=")(.*?)(?:">)

See demo at http://regex101.com/r/rB4nS1

That does NOT mean it's a good idea to try to parse (general) html with regex - but sometimes, when the response is well formed and well known, you get away with it.

Note that you mentioned you wanted hxxp:// in front of the string - that is not the job of the regular expression, but belongs with the language you use to implement the expression. The above is a "non greedy match starting after the string data-href=" and ending at the next ">

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.