1
"outline-style: none; margin: 0px; padding: 2px; background-color: #eff0f8; color: #3b3a39; font-family: Georgia,'Times New Roman',Times,serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px; orphans: 2; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border: 1px solid #ebebeb; float: left;"

I have this as inline css. I would like to substitute blank space for all the properties starting with "background" and "font" using regular expression. In inline css, the last property might not have semi colon as end

I am using this code as a django filter to remove those properties from server side using beautiful soup

def html_remove_attrs(value):
    soup = BeautifulSoup(value)
    print "hi"
    for tag in soup.findAll(True,{'style': re.compile(r'')}): 
        #tag.attrs = None
        #for attr in tag.attrs:
        #    if "class" in attr:
        #        tag.attrs.remove(attr)
        #    if "style" in attr:
        #        tag.attrs.remove(attr)
        for attr in tag.attrs:
            if "style" in attr:
                #remove the background and font properties 

    return soup
4
  • are you doing this BEFORE it goes live OR when it hits the client side (javascript?) Commented Dec 7, 2011 at 14:26
  • I have to parse it from server side.. Commented Dec 7, 2011 at 14:29
  • You should probably reconsider using 'inline css' in favor of re-usable classes. Commented Dec 7, 2011 at 14:31
  • the content i am getting is TinyMCE pasted HTML, Which are posted by users from some other websites.. i have to replace font* and background* properties of elements to make the content compatible to my Web theme Commented Dec 7, 2011 at 14:36

1 Answer 1

2

I don't know about the details of your programming environment, but you asked for a regular expression. This regular expression will find property keys (plus colon and any space) as group 1 ($1) and property values as group 2 ($2):

 ((?:background|font)(?:[^:]+):(?:\\s*))([^;]+)

The expression does not remove the property values. It finds them. How you remove them depends on your programming environment (language/libraries).

But basically, you would be doing a global find/replace, replacing the whole result with $1.

For example, using Java you could do this

public static void main(String[] args) throws Exception {

    String[] lines = {
        "outline-style: none; margin: 0px; padding: 2px; background-color: #eff0f8; color: #3b3a39; font-family: Georgia,'Times New Roman',Times,serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px; orphans: 2; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border: 1px solid #ebebeb; float: left;",
        "outline-style: none; margin: 0px; padding: 2px; background-color: #eff0f8; color: #3b3a39; font-family: Georgia,'Times New Roman',Times,serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px; orphans: 2; text-align: center; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border: 1px solid #ebebeb; float: left",
        "background-color: #eff0f8;",
        "background-color: #eff0f8",
    };

    String regex = "((?:background|font)(?:[^:]+):(?:\\s*))([^;]+)";

    Pattern p = Pattern.compile(regex);

    for (String s: lines) {
        StringBuffer sb = new StringBuffer();
        Matcher m = p.matcher(s);
        while (m.find()) {

            // capturing group(2) for debug purpose only
            // just to get it's length so we can fill that with '-' 
            // to assist comparison of before and after
            String text = m.group(2);
            text = text.replaceAll(".", "-");
            m.appendReplacement(sb, "$1"+text);

            // for non-debug mode, just use this instead
            // m.appendReplacement(sb, "$1");
        }
        m.appendTail(sb);

        System.err.println("> " + s); // before
        System.err.println("< " +sb.toString()); // after
        System.err.println();
    }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Great expression indeed. Thanks for your help. But when i split with this regex and join all splited data together i get this pastebin.com/n43wUw8x . Values of "background*" and "font*" are not removed :(
I have revised the expression and updated the answer including an example.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.