Regex numbers from string

Question

I am trying to write a regex that can find only numbers from given string. What I mean is:

Input: My number is +12 345 678. I have galaxy s3, its symbol 34abc.

Output: 345 and 678 (but not +12, 3 from word s3 or 34 from 34abc)

I tried just numbers (\d+) and I combinations with white and words characters. The closest was^\d$ but that doesn't work as my numbers are part of the bigger string, not whole string themselves. Can you give me a hint?

------- EDIT

Looks like I just don't know how to check a character without actually getting it into result. Like "digit that follow space character (without this space)".

you can incorporate "\\s", which is any whitespace character — Maljam
– Maljam, Commented Apr 10, 2016 at 18:17
Which characters do you consider to be delimiters (i.e. the ones which may surround a number)? From your examples it's obvious that spaces and dots are delimiters, whereas the "plus" sign isn't. What about other characters: minus sign, comma, underscore, etc.? — Alex Shesterov
– Alex Shesterov, Commented Apr 10, 2016 at 18:21
I don't know how to use it so that it will much the pattern but won't be a part of an output (digits following whitespace character but without it) — Malvinka
– Malvinka, Commented Apr 10, 2016 at 18:23
Alex, at the beginnig it can be only spaces. Then I will think about something more. — Malvinka
– Malvinka, Commented Apr 10, 2016 at 18:24
By your definition 678 doesn't fits the match because it's followed by a . dot. — user2705585
– user2705585, Commented Apr 10, 2016 at 18:35

Alex Shesterov · Accepted Answer · 2016-04-10 18:58:04Z

2

In general case, you can make use of lookbehind and lookahead:

(?<=^|\s)\d+(?=$|\s)

The part which makes it into the captured output is \d+. Lookbehind and lookahead are not included in the match.

I just included spaces as delimiters in the regex, but you may replace \s with any character class, as defined by your requirements. For example, to allow dots as separators (both in front and after the digits), use the following regex:

(?<=^|[\s.])\d+(?=$|[\s.])

The (?<=^|\s) should be read as follows:

(?<= ... ) defines the lookbehind group.
The expression which must precede the \d+ is ^|\s, meaning "either start of the line (^) or whitespace".

Similarly, (?=$|\s) defines the lookahead group (it must follow the captured digits), which is either end of the line ($) or whitespace.

A note on \b mentioned in other answers: it is a nice feature, means "word boundary", but the "word characters" are not customizable. This means that, for example, the "+" character is considered to be a separator and you can't change this if you use \b. With lookaround, you can customize the separators to your needs.

edited Apr 10, 2016 at 18:58

answered Apr 10, 2016 at 18:27

Alex Shesterov

27.8k14 gold badges91 silver badges110 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Andreas Over a year ago

This does not match 678, because it's followed by a period, which is why my answer has extended the lookahead to allow punctuation symbols.

Alex Shesterov Over a year ago

@Andreas, true, I mentioned this in the answer. I was following the OP's comment to the question: at the beginnig it can be only spaces. Then I will think about something more.

Andreas · Accepted Answer · 2016-04-10 18:27:10Z

1

What you seem to want is a sequence of digits (\d+) that is preceded by a whitespace (\s) or the start of the string (^), and followed by a whitespace or punctuation character ([\s.,:;!?]) or the end of the string ($), but the preceding/following whitespace or punctuation character should not be included in the match, so you need positive lookahead ((?=xxx)) and lookbehind ((?<=xxx)).

(?<=^|\s)\d+(?=[\s.,:;!?]|$)

See regex101 for demo.

Remember to double the backslashes in a Java literal.

answered Apr 10, 2016 at 18:27

Andreas

160k13 gold badges164 silver badges262 bronze badges

Comments

Kaspar Lee · Accepted Answer · 2016-04-10 18:45:51Z

1

Safer RegEx

Try this:

(?<=\s|^)\d+(?=\s|\b)

Live Demo on Regex101

How it works:

(?<=\s|^)          # Start of String OR Whitespace (will not select +)
                   # Positive Lookbehind ensures the data is not included in the match
\d+                # Digit(s)
(?=\s|\b)          # Whitespace OR Word Boundary
                   # Positive Lookahead ensures the data is not included in the match

Lookarounds do not take up any characters in the match, so they can be used so Capture Groups do not need to be. For example:

# Regex /.*barbaz/
barbaz          # Matched Data Result: barbaz
foobarbaz       # Matched Data Result: foobarbaz

# Regex (with Positive Lookahead) /.*bar(?=baz)/
barbaz          # Matched Data Result: bar
foobarbaz       # Matched Data Result: foobar

As you can see with the second RegEx, baz is never included in the matched data result, however it was required in the string for the RegEx to match. The RegEx above works on the same principle

Not as Safe (Old) RegEx

You can try this RegEx:

\b\d+\b

\b is a Word Boundary. This will, however, select 12 from +12.

You can change the RegEx to this to stop 12 from being selected:

(?<!\+)\b\d+\b

This uses a Negative Lookbehind and will fail if there is a + before the digits.

Live Demo on Regex101

edited Apr 10, 2016 at 18:45

answered Apr 10, 2016 at 18:26

Kaspar Lee

5,6465 gold badges33 silver badges55 bronze badges

1 Comment

Alex Shesterov Over a year ago

Just a note: The regex (?<=\s|^)\d+(?=\s|\b) will match digits preceded by space and followed by *any* non-word character, e.g. all of these would match: 123+, 123@, 123β (β could be any non-Latin letter), but not 123_. P.S. I'm not the downvoter.

Collectives™ on Stack Overflow

Regex numbers from string

3 Answers 3

2 Comments

Comments

Safer RegEx

Not as Safe (Old) RegEx

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Safer RegEx

Not as Safe (Old) RegEx

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related