Java - Regular Expressions

Question

In Java I am currently learning about the regular expressions syntax, but I don't really understand the RE patterns...

What I know is patterns have group length and for the string pattern below there is a length of 3.

import java.util.regex.*;

public class RE {
    public static void main(String[] args){
        String line = "Foo123";
        String pattern = "(.*)(\\d+)(.*)"; //RE Syntax I get stuck on.

        Pattern r = Pattern.compile(pattern);
        Matcher m = r.matcher(line);

        if (m.find()) {
            System.out.println(m.group(0));
            System.out.println(m.group(1));
            System.out.println(m.group(2));
            System.out.println(m.group(3));
        }
    }
}

I would be like it if someone would explain to me what this expression does what does more than one group do etc...

And here: docs.oracle.com/javase/tutorial/essential/regex/groups.html — Maroun
– Maroun, Commented Dec 10, 2014 at 12:53
So what does + do? It says that "Matches 1 or more of the previous thing" but when I take it out, it makes no difference? — user3818650
– user3818650, Commented Dec 12, 2014 at 11:46
Example: \\d+@ matches 123@, \\d@ matches 5@ but not more than one digit followed by @. — Maroun
– Maroun, Commented Dec 12, 2014 at 12:30

Avinash Raj · Accepted Answer · 2014-12-10 12:54:56Z

3

Group 0 contains the entire match and group 1, 2, 3 contains corresponding captured characters.

Input string: Foo123

Regex : (.*)(\d+)(.*)

The first .* in the first capturing group matches all the characters upto the last. Then it backtracks until it finds a digit. The reason for backtracking is in-order to find a match . And the corresponding digit would be captured by the group 2 (last digit). There is nothing left after all the digits , so you got an empty string inside group 3.

DEMO

answered Dec 10, 2014 at 12:54

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Maroun Over a year ago

Good explanation about the internals.

Toto · Accepted Answer · 2014-12-10 12:53:06Z

1

Here is an explanation:

(       : start capture group 1
    .*  : 0 or more any character
)       : end group
(       : start capture group 2
    \\d+: 1 or more digit
)       : end group
(       : start capture group 3
    .*  : 0 or more any character
)       : end group

This regex matches for example:

123
abc456kljh
:.?222

answered Dec 10, 2014 at 12:53

Toto

91.7k63 gold badges97 silver badges135 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:21:27Z

1

String line = "Foo123";
String pattern = "(.*)(\\d+)(.*)"; 
// (take any character - zero or more) // (digits one or more) // (take any character - zero or more)

So in the above case we have 3 groups captured. One with any character zero or more (greedy - can read at this link), then we have digits with \d pattern + corresponds to one or more.

edited May 23, 2017 at 12:21

CommunityBot

11 silver badge

answered Dec 10, 2014 at 12:54

nitishagar

9,4313 gold badges32 silver badges41 bronze badges

Comments

Naveen Kumar Alone · Accepted Answer · 2014-12-17 06:42:28Z

0

(.)(\\d+)(.)

You can hover over the regular expression you will get an explanation of that part.

1st Capturing group (.*)
  .* matches any character (except newline)
  Quantifier: * Between zero and unlimited times, as many times as possible
2nd Capturing group (\d+)
  \\ matches the character \ literally
  d+ matches the character d literally (case sensitive)
  Quantifier: + Between one and unlimited times, as many times as possible
3rd Capturing group (.*)
  .* matches any character (except newline)
  Quantifier: * Between zero and unlimited times, as many times as possible

edited Dec 17, 2014 at 6:42

answered Dec 10, 2014 at 12:52

Naveen Kumar Alone

7,6985 gold badges40 silver badges59 bronze badges

3 Comments

James Over a year ago

\\ is a \ escaped in Java, so the second group is actually \d+

Naveen Kumar Alone Over a year ago

@James in Stackoverflow for bold fonts i made it as **[(.)(\\d+)(.)]** So it displays as (.)(\d+)(.). Modified it to **[(.)(\\\d+)(.)]**. Its now (.)(\\d+)(.). Thanks for your observation. Please review it.

James Over a year ago

Sorry, you changed the wrong bit. In your explanation, you've put that the 2nd capturing group is \\d+ but it should be \d+ (one or more digits). This is because a \\ in a Java string is an escaped \

Collectives™ on Stack Overflow

Java - Regular Expressions

4 Answers 4

1 Comment

Comments

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related