0

I'm using regex to parse HTML. So, confessing that sin right off the bat. If you have a better way, answer it here because I feel dirty and wrong.

Nonetheless, I can't find the answer to this regex question which can apply to non-HTML.

I have a string like:

tag ='style="width: 2010px; background-color: red; height: 200px; font-size: 12px"'

and want to remove the width and height elements only, so I tried:

    r = r'style="(width:\s?\d+px;?)|(height:\s?\d+px;?)'
    tag = re.sub(r, "", tag)

The pattern seems to match in regex101 here but I'm getting a TypeError: 'expected string or buffer.

4
  • Works for me without modifications: ' background-color: red; font-size: 12px"'. Commented Feb 27, 2017 at 21:55
  • 1
    Are you sure tag is a string, and not a BeautifulSoup element or some other object? Commented Feb 27, 2017 at 21:58
  • Ah. Yup. It's a Beautiful Soup Element. Commented Feb 27, 2017 at 21:58
  • 1
    Then there's your problem :) Commented Feb 27, 2017 at 22:04

1 Answer 1

1

Try using the following regex :

(?:width|height):\s?\d+px;?\s?

DEMO

python

import re
regex = r"(?:width|height):\s?\d+px;?\s?"
test_str = '<div id="attachment_9565" class="wp-caption aligncenter" style="width: 2010px;background-color:red;height:200px">'
subst = ""
result = re.sub(regex, subst, test_str, 0)
if result:
    print (result)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.