4

Firstly, my apologies as I don't know regular expressions that well.

I am using a regular expression to match a string. I tested it in Python command line interface, but when I ran it in Java, it produced a different result.

Python execution:

re.search("[0-9]*[\\.[0-9]+]?[^0-9]*D\\([M|W]\\)\\s*US", "9.5 D(M) US");

gives the result as:

<_sre.SRE_Match object; span=(0, 11), match='9.5 D(M) US'>

But the Java code

import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class RegexTest {
    private static final Pattern FALLBACK_MEN_SIZE_PATTERN = Pattern.compile("[0-9]*[\\.[0-9]+]?[^0-9]*D\\([M|W]\\)\\s*US");

    public static void main(String[] args) {
    String strTest = "9.5 D(M) US";
    Matcher matcher = FALLBACK_MEN_SIZE_PATTERN.matcher(strTest);
        if (matcher.find()) {
            System.out.println(matcher.group(0));
        }
    }
}

gives the output as:

5 D(M) US

I don't understand why it is behaving the different way.

7
  • Note that you can ditch the extra backslashes in Python with a "raw string* r'[0-9]*[\.[0-9]+]?...', and that you can use \d for [0-9]. Commented May 29, 2015 at 10:27
  • Well, the regex definitely needs some adjustment. You put alternatives into a character class rather than in a group. Commented May 29, 2015 at 10:28
  • @jonrsharpe thanks for the comment, will do that Commented May 29, 2015 at 10:30
  • @stribizhev can you please elaborate. Commented May 29, 2015 at 10:30
  • Also the pipe in [M|W] is a literal character to match... have a look at e.g. regex101.com/r/kT9fD4/1 Commented May 29, 2015 at 10:30

2 Answers 2

5

Here is the pattern that will work the same in Java and Python:

"[0-9]*(?:\\.[0-9]+)?[^0-9]*D\\([MW]\\)\\s*US"

See Python and Java demos.

In Python, [\\.[0-9]+]? is read as 2 subpatterns: [\.[0-9]+ (1 or more .s, [s, or digits) and ]? (0 or 1 ]). See how your regex works in Python here. Or, with more detalization with capturing groups, here.

In Java, it is read as one single character class (i.e. [ and ] inside are ignored as they cannot be parsed correctly by the regex engine, thus the whole subpattern standing for 0 or 1 ., a digit, or +) and since it is optional, it was not capturing anything (you can get a visual hint at Visual Regex Tester, type 123.+[] as input and [\.[0-9]+]? as regex).

And a final touch: [M|W] stands for M, |, or W, while I think you meant [MW] = M or W.

Sign up to request clarification or add additional context in comments.

Comments

1

I'm not a Python expert, so I can't tell you why it worked on Python, but in Java, your problem is the [\\.[0-9]+]? part. You probably meant it to be (\\.[0-9]+)?.

As it is, it's a list of characters inside a [] followed by a ?. That is, this part of the expression only matches a single or zero character, so it cannot match the .5 together.

Here is an illustration of the matching attempts:

Graphical demonstration of matching in Java

Now, if your pattern used () instead of [], this would be the result:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.