2

Hi I need to write a regular expression in java that will find all instances of :

wsp:rsidP="005816D6" wsp:rsidR="005816D6" wsp:rsidRDefault="005816D6" 

attributes in an XML string and strip them out:

So I need to rip out all attributes that starts with wsp:rsid and ends with a double quote (")

Thoughts on this:

  1. String str = xmlstring.replaceAll("wsp:rsid/w", "");
  2. String str = xmlstring.replaceAll("wsp:rsid[]\\"", "");

4 Answers 4

2
xmlstring.replaceAll( "wsp:rsid\\w*?=\".*?\"", "" );

This works in my tests...

public void testReplaceAll() throws Exception {
    String regex = "wsp:rsid\\w*?=\".*?\"";

    assertEquals( "", "wsp:rsidP=\"005816D6\"".replaceAll( regex, "" ) );
    assertEquals( "", "wsp:rsidR=\"005816D6\"".replaceAll( regex, "" ) );
    assertEquals( "", "wsp:rsidRDefault=\"005816D6\"".replaceAll( regex, "" ) );
    assertEquals( "a=\"1\" >", "a=\"1\" wsp:rsidP=\"005816D6\">".replaceAll( regex, "" ) );
    assertEquals(
            "bob   kuhar",
            "bob wsp:rsidP=\"005816D6\" wsp:rsidRDefault=\"005816D6\" kuhar".replaceAll( regex, "" ) );
    assertEquals(
            " keepme=\"yes\" ",
            "wsp:rsidP=\"005816D6\" keepme=\"yes\" wsp:rsidR=\"005816D6\"".replaceAll( regex, "" ) );
    assertEquals(
            "<node a=\"l\"  b=\"m\"  c=\"r\">",
            "<node a=\"l\" wsp:rsidP=\"0\" b=\"m\" wsp:rsidR=\"0\" c=\"r\">".replaceAll( regex, "" ) );
    // Sadly doesn't handle the embedded \" case...
    // assertEquals( "", "wsp:rsidR=\"hello\\\"world\"".replaceAll( regex, "" ) );
}
Sign up to request clarification or add additional context in comments.

3 Comments

it works now. regex made non-greedy as per Bohemian derision er suggestion.
I'll remove the -1 when a) you remove yours, and b) you fix your answer (the first regex is the still old broken one)
My answer works, or at least meets the requirements of the original question. I don't see a need for it being edited. I really don't care much about the -1.
1

Try:

xmlstring.replaceAll("\\bwsp:rsid\\w*=\"[^\"]+(\\\\\"[^\"]*)*\"", "");

Also, your regexes are wrong. I suggest you go and plough through http://regular-expressions.info ;)

7 Comments

don't you mean "\\bwsp:rsid=\"[^\"]+\""?
No, since rsid can be followed by R for instance.
Let me put it this way... your regex doesn't work. Test it yourself to see.
And your regex won't work with wsp:rsidR="hello\"world" either. Meh. My edited one will, however.
FYI, to work with "hello\"world" you'd need an XML parser, which is beyond the scope of this question. See You shouldn't try to parse HTML with regex for why
|
0

Here are 2 functions. clean will do the replacement, extract will extract the data (if you want it, not sure)

Please excuse the style, I wanted you to be able to cut and paste the functions.

import java.util.HashMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class Answer {

    public static HashMap<String, String> extract(String s){
        Pattern pattern  = Pattern.compile("wsp:rsid(.+?)=\"(.+?)\"");
        Matcher matcher = pattern.matcher(s);
        HashMap<String, String> hm = new HashMap<String, String>();

        //The first group is the string between the wsp:rsid and the =
        //The second is the value
        while (matcher.find()){
            hm.put(matcher.group(1), matcher.group(2));
        }

        return hm;
    }

    public static String clean(String s){
        Pattern pattern  = Pattern.compile("wsp:rsid(.+?)=\"(.+?)\"");
        Matcher matcher = pattern.matcher(s);
        return matcher.replaceAll("");
    }

    public static void main(String[] args) {

        System.out.print(clean("sadfasdfchri wsp:rsidP=\"005816D6\" foo=\"bar\" wsp:rsidR=\"005816D6\" wsp:rsidRDefault=\"005816D6\""));
        HashMap<String, String> m = extract("sadfasdfchri wsp:rsidP=\"005816D6\" foo=\"bar\" wsp:rsidR=\"005816D6\" wsp:rsidRDefault=\"005816D6\"");
        System.out.println("");

        //ripped off of http://stackoverflow.com/questions/1066589/java-iterate-through-hashmap
        for (String key : m.keySet()) {
            System.out.println("Key: " + key + ", Value: " + m.get(key));
        }

    }   

}

returns:

sadfasdfchri  foo="bar"

Key: RDefault, Value: 005816D6

Key: P, Value: 005816D6

Key: R, Value: 005816D6

2 Comments

This is an appalling solution... "more code" does not mean "better code". The "correct" answer is a one liner.
Most is boiler plate. Of course the answer is one line, actually just one regex. The correct implementation of the answer is not one line of code. We don't know what that is. I provided mine.
0

Unlike all other answers, this answer actually works!

xmlstring.replaceAll("\\bwsp:rsid\\w*?=\"[^\"]*\"", "");

Here's a test that fails with all other answers:

public static void main(String[] args) {
    String xmlstring = "<tag wsp:rsidR=\"005816D6\" foo=\"bar\" wsp:rsidRDefault=\"005816D6\">hello</tag>";
    System.out.println(xmlstring);
    System.out.println(xmlstring.replaceAll("\\bwsp:rsid\\w*?=\"[^\"]*\"", ""));
}

Output:

<tag wsp:rsidR="005816D6" foo="bar" wsp:rsidRDefault="005816D6">hello</tag>
<tag  foo="bar" >hello</tag>

1 Comment

SO won't let me remove the -1 unless the answer takes an edit. Make an edit and I'll put it back. I still think your style lacks the objectivity that make SO work so well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.