0

As I'm consuming Twitter API, I got several strings (tweets) containing links, that's it substrings beggining with 'http://'.

How can I get rid of such links, that's it, I want to remove the whole word.

Let's say I have:

'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre http://t.co/Ad2oWDNd4u'

And I want to obtain:

'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre'

Such substrings may appear anywhere along the string

4
  • Will they appear only at the end? Commented Apr 8, 2014 at 3:24
  • @thefourtheye I'm already used to see you around :P They might not only appear at the end Commented Apr 8, 2014 at 3:25
  • Also, will there be a space after the URLs? Commented Apr 8, 2014 at 3:32
  • There might be, @thefourtheye Commented Apr 8, 2014 at 3:35

2 Answers 2

4

You can use re.sub() to replace all links with an empty string:

>>> import re
>>> pattern = re.compile('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')
>>> s = 'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre http://t.co/Ad2oWDNd4u'
>>> pattern.sub('', s)
'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre '

It replaces all the links in the string anywhere inside it:

>>> s = "I've used google https://google.com and found a regular expression pattern to find links here https://stackoverflow.com/questions/6883049/regex-to-find-urls-in-string-in-python"
>>> pattern.sub('', s)
"I've used google  and found a regular expression pattern to find links here "                                                                                                                                            

Regular expression was taken from this thread:

Sign up to request clarification or add additional context in comments.

Comments

0

You can just do it as:

s[:s.index('http://')-1]

If it doesn't always appear at the end, you can do:

your_list = s.split()
i = 0
while i < len(your_list):
    if your_list[i].startswith('http://'):
        del your_list[i]
    else:
        i+=1
s = ' '.join(your_list)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.