1

I have URLs in a text that look like this:

<https://buy.itunes.apple.com/WebObjects/MZFinance.woa/wa/reportAProblem?p
=22000073760328&o=i>

I've used the following pattern to try and remove them:

re.sub(r'\<http.+?\>', '', plain, re.S)

But it won't get them all, for example, this one doesn't get removed:

<http://ax.phobos.apple.com.edgesuite.net/email/images_shared/spacer_99999\r\n9.gif>
3
  • If you put r (raw string) before assigining the second string (r'<http://ax.phobos.apple.com.edgesuite.net/email/images_shared/spacer_99999\r\n9.gif>') or put double backslash (\\) (<http://ax.phobos.apple.com.edgesuite.net/email/images_shared/spacer_99999\\r\\n9.gif>) it will work Commented Mar 29, 2013 at 20:35
  • This is pretty odd. Played around with it for a bit and it does match it: re.match(r'.', '\n', re.S) works, but re.sub(r'.', '', '\n', re.S) does not. So it seems to match, but the replacing part fails somehow... really not sure where or how though. It's as if re.S doesn't work for re.sub. Commented Mar 29, 2013 at 20:39
  • Yeah that's what happens to me. Some URLs are removed but others remain. Commented Mar 29, 2013 at 20:41

1 Answer 1

7

Try it like this

p=re.compile(r'\<http.+?\>', re.DOTALL)
re.sub(p, '', plain)
Sign up to request clarification or add additional context in comments.

4 Comments

This did it, thank you. Care to add an explanation as to why the precompiled pattern works?
Actually after taking a look at the re.sub function I think you missed that there is an additional argument before the flags argument, so something like re.sub(r'\<http.+?\>', '', plain, flags=re.S) should also work.
@8vius The flag is being passed incorrectly for some reason, although I really don't know why. This encodes the flag in the pattern itself. According to the docs, re.sub takes five arguments (pattern, repl, str, count, flags), the last two being optional. However, when I try to call it with 5 arguments, it tells me it expects 4. In Python 3 it works when I do re.sub(r'.', '', '\n', 0, re.S), as well as re.sub(r'.', '', '\n', flags=re.S), neither of which works for me in Python 2, despite what the docs for it say.
Yes, setting the flags explicitly works as well, thanks. That's why it works with the precompiled then. Thank you both.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.