4

I would like to remove urls from a string and replace them with their titles of the original contents.

For example:

mystring = "Ah I like this site: http://www.stackoverflow.com. Also I must say I like http://www.digg.com"

sanitize(mystring) # it becomes "Ah I like this site: Stack Overflow. Also I must say I like Digg - The Latest News Headlines, Videos and Images"

For replacing url with the title, I have written this snipplet:

#get_title: string -> string
def get_title(url):
    """Returns the title of the input URL"""

    output = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
    return output.title.string

I somehow need to apply this function to strings where it catches the urls and converts to titles via get_title.

1
  • I have updated the question, sorry :) Commented May 8, 2010 at 17:30

2 Answers 2

3

Here is a question with information for validating a url in Python: How do you validate a URL with a regular expression in Python?

urlparse module is probably your best bet. You will still have to decide what constitutes a valid url in the context of your application.

To check the string for a url you will want to iterate over each word in the string check it and then replace the valid url with the title.

example code (you will need to write valid_url):

def sanitize(mystring):
  for word in mystring.split(" "):
    if valid_url(word):
      mystring = mystring.replace(word, get_title(word))
  return mystring
Sign up to request clarification or add additional context in comments.

Comments

2

You can probably solve this using regular expressions and substitution (re.sub accepts a function, which will be passed the Match object for each occurence and returns the string to replace it with):

url = re.compile("http:\/\/(.*?)/")
text = url.sub(get_title, text)

The difficult thing is creating a regexp that matches an URL, not more, not less.

1 Comment

1. get_title() should accept MatchObject (not just string). 2. Django uses somethink like r'https?://[^ \t\n\r]+' to linkify text

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.