3

Is it possible in Python to have the type of a capture group be an integer?
Let's assume I have the following regex:

>>> import re
>>> p = re.compile('[0-9]+')
>>> re.search(p, 'abc123def').group(0)
'123'

I wish that the type of '123' in the group was int, since it can only match integers. It feels like there has to be a better way than defining to only match numbers and then having to convert it to an int afterwards nevertheless.
The background is that I have a complex regex with multiple named capture groups, and some of those capture groups only match integers. I would like those capture groups to be of type integer.

2
  • AFAIK, there is no way to do this ... Commented Oct 7, 2015 at 0:17
  • 1
    Regular expressions operate on strings and produce (collections of) strings. Any post-processing or mapping is up to you. Commented Oct 7, 2015 at 0:19

4 Answers 4

8

No, there is not. You can convert it yourself, but re operates on text, and produces text, that's it.

Sign up to request clarification or add additional context in comments.

1 Comment

If you want to get extra pedantic, it works on binary data too (so on Py3, it can work on bytes which is not a text type), but it's too pedantic to cover in the simple answer.
2

Unfortunately that's the best you can do.

>>> import re
>>> p = re.compile('[0-9]+')
>>> a = re.search(p, 'abc123def').group(0)
>>> a.isdigit()
True
>>> a
'123'
>>> type(a)
<class 'str'>

Create an if statement from isdigit() and go from there.

1 Comment

Given the capture definitionally only captured 0-9 (and captured at least one of them), the isdigit test is unnecessary. It's also a bad test even when you're not sure (int handles a lot of things simple isdigit tests will reject, e.g. "-123", " 0 ", "0x123" [when base provided], etc.). Don't write LBYL code that will invariably miss a corner case, just EAFP it: call int and (if there is something else reasonable to do when it fails) catch the ValueError if it's not parsable as such.
1

An example of use case: taking the average from two street numbers.

import pandas as pd

addresses = pd.Series(["3 - 5 Mint Road", "20-23 Cinnamon Street"])

def street_number_average(capture):
    number_1 = int(capture.group(1))
    number_2 = int(capture.group(2))
    average  = round((number_1 + number_2) / 2)
    return str(average)

pattern = r'(\d\d?) *?- *?(\d\d?)'

addresses.str.replace(pattern, street_number_average)

# > 0           4 Mint Road
# > 1    22 Cinnamon Street

Don't forget to convert back to string after doing the operations on the numbers, or it will return a NaN.

Comments

-1

People might be misunderstanding the question due to wording.

They are correct in that Regular Expressions only operate on subclasses of basestring which includes str and unicode Python classes.

However within the domain of Regular Expressions there are symbols that match classes of characters (in Regular Expression terms) \d should do that for you.

See the pythex website or read up on Regular Expressions on other sites for more info.

4 Comments

The OP is already matching on digits (\d is slightly more concise, but for ASCII, no different than [0-9]), so they know how to get only the integer-y components of the string. The question really is asking "How do I get back the int value 123 rather than the str value "123" from a call to match.group(0), without explicitly wrapping the call with the int constructor?"
And the answer is you don't. Regular Expressions don't work like that, but Python is dynamic and can do that.
\d works for more than ASCII digits by the way. ICU has come a long way.
I'm aware it works for all Unicode characters with the appropriate property. That's why I said "for ASCII" (trying to avoid this can of worms for someone who is looking at ASCII text). Of course, since this is Python 2.7 and he's working with str objects, the distinction is moot or locale dependent in a way I don't want to think about.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.