1

I need a regex that makes it possible to extract a part out of String. I get this String by parsing a XML-Document with DOM. Then I am looking for the "§regex" part in this String and now I try do extract the value of it. e.g. "([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})" from the rest.

The Problem is, I don´t know how to make sure the extracted part ends with a ")" This regex needs to work for every value given. The goal is to write only the Value in brackets after the "§regex=" including the brackets into a String.

<UML:TaggedValue tag="description" value=" random Text §regex=([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3}) random text"/>

private List<String> findRegex() {
    List<String> forReturn = new ArrayList<String>();
    for (String str : attDescription) {
        if (str.contains("§regex=")) {
            String s = str.replaceAll(regex);
            forReturn.add(s);
        }
    }
    return forReturn;
}

attDescription is a list which contains all Attributes found in the XML-Document parsed.

So far i tried this regex: ".*(§regex=)(.*)[)$].*", "$2" but this cuts off the ")" and does not delete the text infront of the searched part. Even with the help of this http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html I really don´t understand how to get what I need.

5
  • You should provide some examples of the strings to match and the expected result, without this strange §regex decoration. The code snippet is confusing - what is regex? Commented Jul 4, 2014 at 14:52
  • It's pretty limiting to assume that the regex will have no capturing groups, non-capturing groups, literal parens, or spaces. So it seems like, unless you can know the structure of the text that follows it, I don't see how you can do it. Perhaps the regex can also end with =regex[squiggle]. Then you would have a clear delimiter to search for. Do you have control over the input in this way? (I'd also consider using a more standard character other than the squiggle thing.) Commented Jul 4, 2014 at 14:53
  • 1
    replaceAll needs a second parameter. Commented Jul 4, 2014 at 14:54
  • Try this: ".*§regex=(\\(.*\\)).*", "$1" Commented Jul 4, 2014 at 14:55
  • Also, the dollar sign in your regex, .*(§regex=)(.*)[)$].*, can't work, as it's expecting text to exist after the end of the line. Commented Jul 4, 2014 at 14:57

2 Answers 2

2

It seems to work for me (with this example anyway) if I use this in place of String s = str.replaceAll(regex);

String s = str.replaceAll( ".*§regex=(\\(.*\\)).*", "$1" );

It's just looking for a substring enclosed by parentheses following §regex=.

Sign up to request clarification or add additional context in comments.

Comments

0

This seems to work:

String s = str.replaceAll(".*§regex=\\((.*)[)].*", "$1");

Note:

  • Escape the leading bracket
  • The $ inside a character class is a literal $ - ignore it, because your regex should always end with a bracket
  • No need to capture the fixed text

Test code, noting that this works with brackets in/around the regex:

String str = "random Text §regex=(([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})) random text";
String s = str.replaceAll(".*§regex=\\((.*)[)].*", "$1");
System.out.println(s);

Output:

([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.