1
s = re.sub(r"<style.*?</style>", "", s)

Isn't this code supposed to remove styles in the s string? Why does it not work? I am trying to remove the following code:

<style type="text/css">
body { ... }
</style>

Any suggestion?

1

1 Answer 1

6

No it's the re.DOTALL flag that is necessary !

re.DOTALL
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

http://docs.python.org/library/re.html#re.DOTALL

Edit

In some cases, it may be necessary to have a dot matching all characters (newlines comprised) in a region of a string, and to have a dot matching only non newlines characters in another region of the sting. But using flag re.DOTALL doesn't allow this.

In this case, it's usefull to know the following trick: using [\s\S] to symbolize every character

import re

s = '''alhambra
<style type="text/css">
body { ... }
</style>
toromizuXXXXXXXX
YYYYYYYYYYYYYY'''
print s,'\n'

regx = re.compile("<style[\s\S]*?</style>|(?<=ro)mizu.+")

s = regx.sub('AAA',s)
print s

result

alhambra
<style type="text/css">
body { ... }
</style>
toromizuXXXXXXXX
YYYYYYYYYYYYYY 

alhambra
AAA
toroAAA
YYYYYYYYYYYYYY
Sign up to request clarification or add additional context in comments.

1 Comment

Yes correct, I just came back to say that I've found the solution but here you are! Good answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.