1

Is there a common algorithm to cut urls from some string?

For example:

 string1 = "bla bla bla http://bla.domain.com more blah blah nohttp.domain.with.no.protocol more text bla bla"
 (string2, urls) = wild_magic_appears(string1)
 string2 = "bla bla bla  more blah blah  more text bla bla"
 urls = ["http://bla.domain.com", "nohttp.domain.with.no.protocol"]

I know that regex is the best solution for that, but I'm interested in non-regex solution

1
  • 2
    You could split the string in words (split at ` `) and consider each word separately. How wild the magic will be depends on what you want to match, e.g. the simplest requirement would be "any word starting with http://, https:// or containing more than one dot". Commented Dec 17, 2013 at 8:22

3 Answers 3

1

In C# you can do this for urls that starts with "http://"

string str1 = "bla bla bla http://bla.domain.com more blah blah nohttp.domain.with.no.protocol";
string [] array = str1.Split(' ');
Listr<string> urls= new List<string>();

foreach(var s in array)
{
   if(s.StartsWith("http://")) // you can add here other conditions that match url
     urls.Add(s);
}
Sign up to request clarification or add additional context in comments.

1 Comment

Pretty simple. For those who will search solution for this question, I propose to detect urls by protocol names, dots and list of top-level domains (as I did).
0

Ruby,split colon and spaces.

only for urls starts with http:// and your string part don't have a colon.

>a = "bla bla bla http://bla.domain.com more blah blah nohttp.domain.with.no.protocol more text bla bla"
>a.split(":")[0].to_s[-4..-1] + ":" + a.split(":")[1].split()[0].to_s
=> "http://bla.domain.com"

for urls with only dots.I can't think of a good solution.

1 Comment

This is a quite narrow solution. This is not a great solution for user-texts with ':'.
0

Think of a new solution.just to split "http://" or "https://". This one is better to deal with user's colon.

>a = "bla bla bla http://bla.domain.com more blah blah nohttp.domain.with.no.protocol more text bla bla"
>("http://"+a.split("http://")[1].to_s).split()[0]
=>"http://bla.domain.com"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.