0

i have an string variable with links inside (among other text), and i want to be able to extract all links containing a certain patron (like containing the word 'case')... is this possible to do?

Variable string is something like:

var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';

As a workaround, i used what described here: extract links from document, to create a document with the string as content and then extract the links, but i would like to do it directly...

Regards,

EDIT (To Ruben):

If i use:

var string = 'http://mangafox.me/manga/tales_of_demons_and_gods/c105/1.html here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more ';

I got only the first link twice (see screenshot here).

And if i use:

var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more http://mangafox.me/manga/tales_of_demons_and_gods/c105/1.html ';

The same again (see screenshoot here).

2
  • What do you mean by "an string variable with links inside"? Are they URL? Including a sample string could clarify what you mean. What do you tried? Commented Nov 21, 2016 at 17:47
  • ok. variable string is something like: var string = 'here is some text line among the ones there will be links like stackoverflow.com/questions/40725199/… and more'; Commented Nov 21, 2016 at 18:36

2 Answers 2

1

Google Apps Script

function test2(){
  var re = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))/i;
  var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';
  for(var i = 0; i <= re.exec(string).length; i++){
    if(re.exec(string)[i]) Logger.log(re.exec(string)[i]) 
  }
}

JavaScript.

var re = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'"".,<>?«»“”‘’]))/i;
var string = 'here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more here is some text line among the ones there will be links like https://stackoverflow.com/questions/40725199/extract-all-links-from-a-string-with-google-app-script?noredirect=1#comment68679843_40725199 and more';
for(var i = 0; i <= re.exec(string).length; i++){
  if(re.exec(string)[i]) console.log(re.exec(string)[i])
} 

Reference

RegularExpression to Extract Url For Javascript

Sign up to request clarification or add additional context in comments.

3 Comments

Ok, I used your updated version on this string: 'mangafox.me/manga/tales_of_demons_and_gods/c105/1.html stackoverflow.com/questions/40725199/… here is some text line among the ones there will be links like stackoverflow.com/questions/40725199/… and more mangafox.me/manga/tales_of_demons_and_gods/c105/1.html';
I have the same problem with only getting the first link. Any progress during the last 4 years, @Rubén? :-)
@BjörnLarsson The code in this answers is working correctly. Please post a new question including a minimal reproducible example.
1

If you're only getting the first match then I think you need the 'g' flag on the Regular Expression to capture all matches, then each call to exec() will return the next match. I'm using:

const re = /(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&@#\/%=~_|$])/igm;

while ((reResults = re.exec(s)) !== null) { //finds next match
      Logger.log(reResults[0]); //result of next match
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.