removing URL from string using python's re

Question

Using this to try to remove URLs from a string:

text = re.sub(r'https?:\/\/[A-Za-z0-9\.\/]+', '', text)

Unfortunately it works for simple URLs but not for complex ones. So something like http://www.example.com/somestuff.html will be removed but something like http://www.example.com/somestuff.html?query=python etc. will just leave trailing bits behind.

I think I'm at the limits of my re knowledge so any help will be much appreciated. Thx.

gbruenjes · Accepted Answer · 2021-01-22 10:25:04Z

3

Try:

r"https?:[^\s]+"

answered Jan 22, 2021 at 10:25

gbruenjes

4,2251 gold badge18 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Oli Smith Over a year ago

Worked! Accepted. Thx. Just for my benefit, is this what's happening here: https? will match https or http. then match the :. then match start of string to unicode whitespace. finally + to match 1 or more repetitions.

Collectives™ on Stack Overflow

removing URL from string using python's re

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related