1

I want to get urls from a bing search. I get the html, and when I do this regex /<h2><a href="(.*?)"/g it gives me :

["<h2><a href="https://www.test.com/"", "<h2><a href="http://fr.wikipedia.org/wiki/Test_(informatique)"", "<h2><a href="http://www.speedtest.net/"", "<h2><a href="http://test.psychologies.com/"", "<h2><a href="http://www.thefreedictionary.com/test"", "<h2><a href="http://fr.wikipedia.org/wiki/Test"", "<h2><a href="http://www.wordreference.com/enfr/test"", "<h2><a href="http://www.sedecouvrir.fr/"", "<h2><a href="http://www.jeuxvideo.com/tests.htm"", "<h2><a href="http://en.wikipedia.org/wiki/Test""]

For js code, I used match

html.match(/<h2><a href="(.*?)"/g);

I only want the urls. The html is here: http://www.bing.com/search?q=test. I've already searched the whole day, and I think maybe I have to use group?

3
  • 1
    /<h2><a href="([^"]+)"/g should do it Commented Dec 20, 2014 at 15:23
  • thanks for your reply Ismael. but it's the same thing . Commented Dec 20, 2014 at 15:26
  • This might helps you : stackoverflow.com/questions/3809401/… Commented Dec 20, 2014 at 15:37

3 Answers 3

1

Use Array.map to iterate over the list of html elements and then execute a given regular expression to get the link using group.

"use strict";

var links = ['<h2><a href="https://www.test.com/"',
 '<h2><a href="http://fr.wikipedia.org/wiki/Test_(informatique)"', 
 '<h2><a href="http://www.speedtest.net/"', 
 '<h2><a href="http://test.psychologies.com/"',
 '<h2><a href="http://www.thefreedictionary.com/test"',
 '<h2><a href="http://fr.wikipedia.org/wiki/Test"',
 '<h2><a href="http://www.wordreference.com/enfr/test"',
 '<h2><a href="http://www.sedecouvrir.fr/"',
 '<h2><a href="http://www.jeuxvideo.com/tests.htm"',
 '<h2><a href="http://en.wikipedia.org/wiki/Test"'];

var result = links.map(function (link) {
  return /<h2><a href="(.*?)"/.exec(link)[1];
});

console.log(result);
Sign up to request clarification or add additional context in comments.

1 Comment

The g flag in /g is not needed there. /g is for multiple matches. You're iterating over an array list of items guaranteed to provide only a single match.
0

That is an array. You need something like this. Also you need groups.

var urls = html.map(function(str){
   return str.replace(/.*href="([^"]+).*/, "$1");
});

Comments

0

If this is being done within a browser, there's really no need to try to use a regex.

var myNodeList= document.getElementsByTagName('a'); 
var i;
for (var i = 0; i < myNodeList.length; ++i) {
    var anchor = myNodeList[i];  
    console.debug(anchor.href);
}

But as hinted in the comments, if you really want to use regexes, all you need to do is iterate over the results like you see in How can I match multiple occurrences with a regex in JavaScript similar to PHP's preg_match_all()? In particular, note the lines:

while (match = re.exec(url)) {
     params[decode(match[1])] = decode(match[2]);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.