Regex: extract String from String

Question

I need a regex that makes it possible to extract a part out of String. I get this String by parsing a XML-Document with DOM. Then I am looking for the "§regex" part in this String and now I try do extract the value of it. e.g. "([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})" from the rest.

The Problem is, I don´t know how to make sure the extracted part ends with a ")" This regex needs to work for every value given. The goal is to write only the Value in brackets after the "§regex=" including the brackets into a String.

<UML:TaggedValue tag="description" value=" random Text §regex=([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3}) random text"/>

private List<String> findRegex() {
    List<String> forReturn = new ArrayList<String>();
    for (String str : attDescription) {
        if (str.contains("§regex=")) {
            String s = str.replaceAll(regex);
            forReturn.add(s);
        }
    }
    return forReturn;
}

attDescription is a list which contains all Attributes found in the XML-Document parsed.

So far i tried this regex: ".*(§regex=)(.*)[)$].*", "$2" but this cuts off the ")" and does not delete the text infront of the searched part. Even with the help of this http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html I really don´t understand how to get what I need.

You should provide some examples of the strings to match and the expected result, without this strange §regex decoration. The code snippet is confusing - what is regex? — laune
– laune, Commented Jul 4, 2014 at 14:52
It's pretty limiting to assume that the regex will have no capturing groups, non-capturing groups, literal parens, or spaces. So it seems like, unless you can know the structure of the text that follows it, I don't see how you can do it. Perhaps the regex can also end with =regex[squiggle]. Then you would have a clear delimiter to search for. Do you have control over the input in this way? (I'd also consider using a more standard character other than the squiggle thing.) — aliteralmind
– aliteralmind, Commented Jul 4, 2014 at 14:53
Also, the dollar sign in your regex, .*(§regex=)(.*)[)$].*, can't work, as it's expecting text to exist after the end of the line. — aliteralmind
– aliteralmind, Commented Jul 4, 2014 at 14:57

gla3dr · Accepted Answer · 2014-07-04 15:03:33Z

2

It seems to work for me (with this example anyway) if I use this in place of String s = str.replaceAll(regex);

String s = str.replaceAll( ".*§regex=(\\(.*\\)).*", "$1" );

It's just looking for a substring enclosed by parentheses following §regex=.

answered Jul 4, 2014 at 15:03

gla3dr

2,35918 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Bohemian · Accepted Answer · 2014-07-04 15:18:54Z

0

This seems to work:

String s = str.replaceAll(".*§regex=\\((.*)[)].*", "$1");

Note:

Escape the leading bracket
The $ inside a character class is a literal $ - ignore it, because your regex should always end with a bracket
No need to capture the fixed text

Test code, noting that this works with brackets in/around the regex:

String str = "random Text §regex=(([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})) random text";
String s = str.replaceAll(".*§regex=\\((.*)[)].*", "$1");
System.out.println(s);

Output:

([A-ZÄÖÜ]{1,3}[- ][A-Z]{1,2}[1-9][0-9]{0,3})

edited Jul 4, 2014 at 15:18

answered Jul 4, 2014 at 15:02

Bohemian♦

427k103 gold badges603 silver badges750 bronze badges

Collectives™ on Stack Overflow

Regex: extract String from String

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related