0

I want a regex which removes a list of attributes from within the style attribute of a given html tag.

Ex : i want to remove height and cursor from span tag.

I/P:

String htmlFragment ="<span id=\"nav-askquestion\" style=\"width:200px;cursor:default;height:100px;\" name="questions"> <b>hh</b></span>";

O/P

<span id="nav-askquestion" style="width:200px;" name="questions"><b>hh</b></span>

I have the following regex but it removes all occurrences height and cursor, not just inside div

String cleanString=htmlFragment.replaceAll("(height|cursor)[ ]*:[ ]*[^;]+;",""); 

Not looking to use html parser for this due to specific requirement.

4
  • 6
    I will strongly suggest to not to use RegEx for this. You should look at the HTML/XML parsers for parsing the tags and data and then do the operations. Commented Jan 13, 2017 at 20:53
  • See also Commented Jan 13, 2017 at 20:53
  • To only replace that in a certain <div>, you will have to make a RegEx search to find all the <div>s, then inside those select which ones to modify, and then to modify them. You cannot use only one RegEx for this. Commented Jan 13, 2017 at 21:10
  • Even when you think a parsing case is too “simple” to worry about the consequences of using regular expressions, it often isn’t. See stackoverflow.com/questions/701166/… . Commented Jan 13, 2017 at 21:43

3 Answers 3

1

I agree with others that it would be better to use HTML/XML parsers, which allow you to drill down to specific elements without worrying about any "accidental" regex matches.

However, having read Xlsx's comment, "You cannot use only one RegEx for this." I was compelled to post this solution using captured groups. This is purely for demonstration purposes only

String reg = "(<span.+)((height|cursor) *:[^;]+;)(.*)((height|cursor) *:[^;]+;)(.*)";

String cleanString=htmlFragment.replaceAll(reg, "$1$4$7"); 

Obviously, it is not pretty and it may still match on some HTML content (as opposed to tags), but it is possible. Unless this is intended as a quick fix, I urge you to use a more appropriate solution as suggested by others. One possible solution would be jsoup.

Sign up to request clarification or add additional context in comments.

Comments

1

Try this regular expression:

\s*(height|cursor)\s*:\s*.+?\s*;\s*

You can test it out here.

If there are other attributes besides height and cursor, you want to capture, you can just keep adding bars between them (background-color|height|font-size) etc.

4 Comments

I wrote something similar , the issue is I want to remove some attributes from within the style attribute only . So if it's <div style= 'height:50px;color:red;width:100px' >hello </div>. Output should be <div style= 'color:red; >hello </div>.
Does java regex support positive look behinds? regular-expressions.info/lookaround.html
Well, then maybe you can add a positive look-behind that makes sure the match is preceded by style="[^"]+ (in other words a an unclosed quote of a style attribute).
0

As I said before, I will strongly suggest to not to use RegEx for this and make use of HTML/XML parsers for parsing the tags and data and then do all your operations.

But if you don't want to do that for some reason then I would suggest you to fallback to the basic sub-string based methods rather than using RegEx.

Here is a sample code snippet for the above situation:

public static void main(String[] args) {
    String htmlFragment = "<span id=\"nav-askquestion\" style=\"width:200px;cursor:default;height:100px;\" name=\"questions\"> <b>hh</b></span>";
    int startIndex = htmlFragment.indexOf("<span");
    int stopIndex = htmlFragment.indexOf("</span>") + 7;

    /* Cursor */
    int cursorStart = htmlFragment.indexOf("cursor:", startIndex);
    int cursorEnd = htmlFragment.indexOf(";", cursorStart);
    htmlFragment = new StringBuilder()
            .append(htmlFragment.substring(startIndex, cursorStart))
            .append(htmlFragment.substring(cursorEnd + 1, stopIndex))
            .toString();

    /* Update Indices */
    stopIndex = htmlFragment.indexOf("</span>") + 7;

    /* Height */
    int heightStart = htmlFragment.indexOf("height:", startIndex);
    int heightEnd = htmlFragment.indexOf(";", heightStart);
    htmlFragment = new StringBuilder()
            .append(htmlFragment.substring(startIndex, heightStart))
            .append(htmlFragment.substring(heightEnd + 1, stopIndex))
            .toString();

    /* Output */
    System.out.println(htmlFragment);
}

I know it looks a bit messy but that's the only way I could think of.

1 Comment

StringBuilder is overkill for the one-time concatenation of two strings. It provides no benefit while reducing readability.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.