6

I have this:

  • 110121 NATURAL 95 1570,40
  • 110121 NATURAL 95 1570,40*
  • 41,110 1 x 38,20 CZK)[A] *
  • ' 31,831 261,791 1308,61)
  • >01572 PRAVO SO 17,00
  • 1,000 ks x 17,00
  • 1570,40

Every line of this output is saved in List and I want to get number 1570,40

My regular expressions looks like this for this type of format

    "([1-9][0-9]*[\\.|,][0-9]{2})[^\\.\\d](.*)"
    "^([1-9][0-9]*[\\.|,][0-9]{2})$"

I have a problem that 1570,40 at the last line if founded (by second regular expression), also 1570,40 (from line with 1570,40* at the end) but the first line is not founded.. do you know where is the problem?

5
  • 2
    I'm a bit confused. Which regular expression are you using? First or the second. And you only want to find the last result, right? Commented Apr 23, 2013 at 11:48
  • 1
    Note that [a|b] is a or | or b. In character lists, | is not an alternatives separator. So [.,] should be what you want. But for all I can say, the second regexp should work. Commented Apr 23, 2013 at 11:49
  • I use both... I have priceFormats.add("([1-9][0-9]*[\\.|,][0-9]{2})[^\\.\\d](.*)"); and priceFormats.add("^([1-9][0-9]*[\\.|,][0-9]{2})$"); ... it is List of price formats and then a have for cycle where I apply for (int i = 0; i < priceFormats.size(); i++) { Pattern pattern = Pattern.compile(priceFormats.get(i)); Matcher matcher = pattern.matcher(concreteLine); while (matcher.find()) { ...... etc Commented Apr 23, 2013 at 11:50
  • Do you want 1570,40 to be matched in each line? And will your matches number always have decilam point? Commented Apr 23, 2013 at 11:52
  • Yes, I want the number to be matched in each line... The second regular expresiion (with ^ and $) is there because the first regexp did not match the number placed in single line (the last row). Commented Apr 23, 2013 at 11:56

3 Answers 3

1

Not sure I well understand your needs, but I think you could use word boundaries like:

\b([1-9]\d*[.,]\d{2})\b

In order to not match dates, you can use:

(?:^|[^.,\d])(\d+[,.]\d\d)(?:[^.,\d]|$)

explanation:

The regular expression:

(?-imsx:(?:^|[^.,\d])(\d+[,.]\d\d)(?:[^.,\d]|$))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    ^                        the beginning of the string
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    [^.,\d]                  any character except: '.', ',', digits
                             (0-9)
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
----------------------------------------------------------------------
    [,.]                     any character of: ',', '.'
----------------------------------------------------------------------
    \d                       digits (0-9)
----------------------------------------------------------------------
    \d                       digits (0-9)
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    [^.,\d]                  any character except: '.', ',', digits
                             (0-9)
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    $                        before an optional \n, and the end of
                             the string
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
Sign up to request clarification or add additional context in comments.

1 Comment

But still... For example when I have "16.10.2012" so the boundary does not work... the pattern returns 16.10
0

The "([1-9][0-9]*[\\.|,][0-9]{2})[^\\.\\d](.*)" has [^\\.\\d], it means it expects one non-digit, non-dot symbol right after the number. The second line has * which matches it. First line has the number at the end of line, so nothing matches. I think you need just one regexp which will catch all numbers: [^.\\d]*([1-9][0-9]*[.,][0-9]{2})[^.\\d]*. Also, you should use find instead of match to find any substring in a string instead of matching the whole string. Also, maybe it has a sense to find all matches in case if a line has two such numbers in it, not sure if it is a case for you or not.

Also, use either [0-9] or \d. At the moment it is confusing - it means the same, but looks differently.

Comments

0

Try this:

String s = "41,110 1 x 38,20 CZK)[A] * ";
Matcher m = Pattern.compile("\\d+,\\d+").matcher(s);
while(m.find()) {
    System.out.println(m.group());
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.