0

I have the following code.

String _partsPattern = "(.*)((\n\n)|(\n)|(.))";
static final Pattern partsPattern = Pattern.compile(_partsPattern);
String text= "PART1: 01/02/03\r\nFindings:no smoking";
Matcher match = partsPattern.matcher(text);
while (match.find()) {
System.out.println( match.group(1));
return; //I just care on the first match for this purpose

      }

Output: PART1: 01/02/0 I was expecting PART1: 01/02/03 why is the 3 at the end of my text not matching in my result.

9
  • 2
    It seems to be working as you want. You can check here: regexplanet.com/advanced/java/index.html Commented Jan 16, 2014 at 23:58
  • What exactly are you trying to capture? Is it always PART#: date? Commented Jan 17, 2014 at 0:01
  • 2
    On second thoughts, you want to match everything that occurs before the occurrence of 1 or 2 newline chars. You could simplify the regex to (.*)(\n{1,2}). Do you really want the last (.)? That will match any character. Commented Jan 17, 2014 at 0:01
  • Running your code I get: PART1: 01/02/03 and then (second match): Findings:no smokin Commented Jan 17, 2014 at 0:06
  • 1
    If you want just first match then don't use while (match.find()) { but if (match.find()) {. This way you will remove unnecessary return statement. Also last part of your regex is (.) which in case there will be no line separators will hold last character from entire match, so this may be reason why instead of PART1: 01/02/03 you see PART1: 01/02/0 - 3 may be in group(5). Commented Jan 17, 2014 at 0:08

2 Answers 2

2

Problem with your regex is that . will not match line separators like \r or \n so your regex will stop before \r and since last part of your regex

(.*)((\n\n)|(\n)|(.))
     ^^^^^^^^^^^^^^^

is required and it can't match \r last character will be stored in (.).

If you don't want to include these line separators in your match just use "(.*)$"; pattern with Pattern.MULTILINE flag to make $ match end of each line (it will represent standard line separators like \r or \r\n or \n but will not include them in match).

So try with

String _partsPattern = "(.*)$"; //parenthesis are not required now
final Pattern partsPattern = Pattern.compile(_partsPattern,Pattern.MULTILINE);

Other approach would be changing your regex to something like (.*)((\r\n)|(\n)|(.)) or (.*)((\r?\n)|(.)) but I am not sure what would be the purpose of last (.) (I would probably remove it). It is just variation of your original regex.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks very much Pshemo6 that works for me. Can you explain in more detail on why mine is ripping of the last char (just for my knowledge) . I could not vote up your answer because I do not have enough reputations to do so.
@HelenAraya As I mentioned earlier your regex contains (\n\n)|(\n)|(.) part and since there is no ? or * after it this part is mandatory. Since \n\n or \n or (.) will not match \r or \r\n in your string first regex matcher will stop before \r. Now as I mentioned earlier this part is mandatory regex has to put there something and only thing that can be used is (.) and it will store your last matching character before \r. Hope it is clearer now.
0

Works, giving "PART1: 01/02/03 ". So my guess is that in the real code you read the text maybe with a Reader.readLine and erroneously strip a carriage return + linefeed. Far fetched but I cannot imagine otherwise. (readLine strips the newline itself.)

1 Comment

I have edited my question. The input text is now String text= "PART1: 01/02/03\r\nFindings:no smoking"; I added \r before \n

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.