0

In Java I am currently learning about the regular expressions syntax, but I don't really understand the RE patterns...

What I know is patterns have group length and for the string pattern below there is a length of 3.

import java.util.regex.*;

public class RE {
    public static void main(String[] args){
        String line = "Foo123";
        String pattern = "(.*)(\\d+)(.*)"; //RE Syntax I get stuck on.

        Pattern r = Pattern.compile(pattern);
        Matcher m = r.matcher(line);

        if (m.find()) {
            System.out.println(m.group(0));
            System.out.println(m.group(1));
            System.out.println(m.group(2));
            System.out.println(m.group(3));
        }
    }
}

I would be like it if someone would explain to me what this expression does what does more than one group do etc...

6
  • 1
    Read about capturing groups. Commented Dec 10, 2014 at 12:51
  • And here: docs.oracle.com/javase/tutorial/essential/regex/groups.html Commented Dec 10, 2014 at 12:53
  • So what does + do? It says that "Matches 1 or more of the previous thing" but when I take it out, it makes no difference? Commented Dec 12, 2014 at 11:46
  • 1
    Example: \\d+@ matches 123@, \\d@ matches 5@ but not more than one digit followed by @. Commented Dec 12, 2014 at 12:30
  • 1
    Because that's what \\b means. Commented Dec 12, 2014 at 14:37

4 Answers 4

3

Group 0 contains the entire match and group 1, 2, 3 contains corresponding captured characters.

Input string: Foo123

Regex : (.*)(\d+)(.*)

The first .* in the first capturing group matches all the characters upto the last. Then it backtracks until it finds a digit. The reason for backtracking is in-order to find a match . And the corresponding digit would be captured by the group 2 (last digit). There is nothing left after all the digits , so you got an empty string inside group 3.

DEMO

Sign up to request clarification or add additional context in comments.

1 Comment

Good explanation about the internals.
1

Here is an explanation:

(       : start capture group 1
    .*  : 0 or more any character
)       : end group
(       : start capture group 2
    \\d+: 1 or more digit
)       : end group
(       : start capture group 3
    .*  : 0 or more any character
)       : end group

This regex matches for example:

  • 123
  • abc456kljh
  • :.?222

Comments

1
String line = "Foo123";
String pattern = "(.*)(\\d+)(.*)"; 
// (take any character - zero or more) // (digits one or more) // (take any character - zero or more)

So in the above case we have 3 groups captured. One with any character zero or more (greedy - can read at this link), then we have digits with \d pattern + corresponds to one or more.

Comments

0

(.)(\\d+)(.)

You can hover over the regular expression you will get an explanation of that part.

1st Capturing group (.*)
  .* matches any character (except newline)
  Quantifier: * Between zero and unlimited times, as many times as possible
2nd Capturing group (\d+)
  \\ matches the character \ literally
  d+ matches the character d literally (case sensitive)
  Quantifier: + Between one and unlimited times, as many times as possible
3rd Capturing group (.*)
  .* matches any character (except newline)
  Quantifier: * Between zero and unlimited times, as many times as possible

3 Comments

\\ is a \ escaped in Java, so the second group is actually \d+
@James in Stackoverflow for bold fonts i made it as **[(.)(\\d+)(.)]** So it displays as (.)(\d+)(.). Modified it to **[(.)(\\\d+)(.)]**. Its now (.)(\\d+)(.). Thanks for your observation. Please review it.
Sorry, you changed the wrong bit. In your explanation, you've put that the 2nd capturing group is \\d+ but it should be \d+ (one or more digits). This is because a \\ in a Java string is an escaped \

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.