4

I want to split a given string which could possibly contain a numeric value, using regexp_matches(). It should identify the first occurrence of a numeric value containing an optional sign and optional decimal places. The non matching parts should be returned as well - as first and last positions of the array.

Some example input and expected output values:

'hello+111123.454545world' -> {hello,+111123.454545,world}
'he-lo+111123.454545world' -> {he-lo,+111123.454545,world}
'hel123.5lo+111123.454545world' -> {hel,123.5,lo+111123.454545world}
'hello+111123.454545world' -> {hello,+111123.454545,world}
'hello+111123.454545world' -> {hello,+111123.454545,world}
'1111.15' -> {"",1111.15,""}
'-.234' -> {"",-.234,""}
'hello-.234' -> {hello,-.234,""}

I'm having trouble with the first part of the match group in the following expression represented by 'TODO'. It is supposed to match anything that cannot be identified as a numeric value.

select regexp_matches('input', '(TODO)((?:\+|-)?(?:\d*(?:(?:\.)?\d+)))(.*)')

The match group represented by '(TODO)' needs to be the negation of the regular expression in the second match group. (As the result is required to be returned). The regex for matching the numeric value works fine, and what I need is how to match the first part of the string which is not a numeric value.

1
  • There is a bug! If you change the first group to non-greedy the 3rd group becomes non-greedy too, which is wrong (and the reason why you might need anchors)! If this is really a postgres issue and not just SQL fiddle it should be reported. Commented Jul 28, 2015 at 17:05

4 Answers 4

2
regexp_matches(input, '(^.*?)([+-]?\d*\.?\d+)(.*$)') AS result_arr
  • 1st match: (^.*?)
    Anchored to the start of the string with ^. The non-greedy quantifier *? is crucial.
    It actually doesn't have to be the negation of the regular expression in the second match group because the rest of the regular expression is greedy. So the first part is what remains, defined by the rest.

  • 2nd match: ([+-]?\d*?\.?\d+)
    I simplified your expression somewhat. In particular a character class [+-] is shorter and faster than two branches in non-capturing parentheses (?:\+|-).
    Non-capturing parentheses are important. (You already had that.)
    Simplified \d* after comment from @maraca.

  • 3rd match: (.*$)
    Anchored to the end of the string with $. For the last match, make the quantifier greedy.

SQL Fiddle with extended test case.

Sign up to request clarification or add additional context in comments.

10 Comments

(?:\d+)? is the same as \d* and anchoring is not necessary, the difference between my 2nd regex and this one is that it doesn't match a dot in the end, e.g. abc12.def mine will give (abc, 12., def) yours (abc, 12, .def)
@maraca: ^ is not essential but it can improve performance ($ is required here). The main difference between your answer and mine is that mine works correctly as opposed to yours: sqlfiddle.com/#!15/9eecb7db59d16c80417c72d1e1f4fbf1/2125 Valid numbers don't have a trailing dot. That's a feature, not a bug.
well in this case it is the same as my first regex with a ? after the dot, why is $ required? regex are greedy.
@maraca: I only saw your answer after your comment. I had skipped all answers with hardly an explanation. Your expressions are similar to mine - except for crucial details which make both your variants fail. I did realize one more simplification that doesn't change the result, though: I changed (?:\d+)? -> \d*. Auditing your expressions is too much for a comment. Start another question if you are interested in details. Here is another fiddle to play with that demonstrates results (including "no row") more clearly.
I give the two variants covering all, I expected he can add the ? himself if he needs it, because in all his examples there is a .! And like I said for the \d*in the first comment ^ and $ aren't needed either... it is the same. You know I already had people just copying my answer as update to theirs getting all the upvotes and me none... I don't say you did it here, but you will just end up with the same, I always reference each answer and person that contributed to mine.
|
1

I think this regex will give you what you want: /'(.*?)([+\-]?[0-9\.]+)(.*?)'/g

Example at: https://regex101.com/r/nF5qV7/1

1 Comment

fails for abc1.1.1def
0

Try this:

(.*?)((?:\+|-)?(?:\d*(?:(?:\.)?\d+)))(.*)

Comments

0

Here is the correct regex, assuming there has to be at least one digit after the dot:

(.*?)([+-]?[0-9]*\.[0-9]+)(.*)

Or with optional dot, matches 1. , .7 , +.8, -4 , 0.0 , 42 , ...

(.*?)([+-]?(?:\.[0-9]+|[0-9]+\.?[0-9]*))(.*)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.