How to parse a string into two different strings based on first instance of an integer? (Python)

Question

I'm trying to take a string like "PR405j" and separate it into two strings. In this instance, the two strings would be "PR" and "405j." There are a variety of strings I have to do this to. Exmaples: "ACR498" would be "ACR" and "498", "FR707e" would be "FR" and "707e", "TY699l" would be "TY" and "699l" and so on and so forth.

The problem I'm having is separating the first part from the second part. The amount of characters on either side differs, and the second string (the one with the numbers) may or may not have alphabetic characters in there as well. The only commonality between all of these strings is that you can divide them based on the first instance of an integer.

I thought a for loop that goes through every character in the original string and builds two separate strings inside would work, but I could only think to base the separation on integers and alphabetic characters, which would make something like "PR405j" turn into "PRj" and "405".

I also thought the split string method would help, but there's no one character all these strings have in common.

Finally, I can't split the strings based on the numbers of alphabetic characters in the beginning of the string (say 2 for "PR405j") because there is variation between strings.

If anybody could help me with this, I'd greatly appreciate it. Thank you!

The alternative to re would be ''.join(itertools.takewhile(operator.methodcaller('isalpha'), thestring)), ''.join(itertools.dropwhile(operator.methodcaller('isalpha'), thestring)), but don't use that. — agf
– agf, Commented Nov 9, 2011 at 19:50
And what do you want to have happen if (1) the string doesn't start with any alphabetics (2) the alphas are not followed by any numerics? — John Machin
– John Machin, Commented Nov 9, 2011 at 20:49

Zack Bloom · Accepted Answer · 2011-11-09 19:49:45Z

5

You can use regular expressions to do simple string matching such as this. The expression '(\D+)(.+)' is saying 'Extract one or more non-digits as the first group, then extract one or more other characters as the second.'

import re

inputs = ['PR405j']

for input in inputs:
    match = re.match('(\D+)(.+)', input)

    start = match.group(1)
    end = match.group(2)

    print input, start, end

edited Nov 9, 2011 at 19:49

answered Nov 9, 2011 at 19:41

Zack Bloom

8,4372 gold badges23 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

NullUserException Over a year ago

You could probably get away with (.+) instead of (\d.+)

Andrew Clark Over a year ago

The one liner would be start, end = re.match(r'(\D+)(.+)', input).groups().

John Machin Over a year ago

@F.J: Bad luck if there is no match. One-liners don't get much use in the real world.

John Machin Over a year ago

@ZackBloom: OP wants to start with alphabetics, not non-digits.

Zack Bloom Over a year ago

@JohnMachin As he is unfamiliar with regex, I went with the simplest example which would work. But, if necessary, the more precise expression might be ^([A-Z]{2,})(\d[a-zA-Z0-9]+)$.

|

dcrosta · Accepted Answer · 2011-11-09 19:44:46Z

0

EDIT: I misunderstood the question, thought you wanted 3 groups, not two. Zack Bloom's answer is more correct, but I'll leave this here as a reference in case someone has a similar question.

You can use re.split:

>>> re.split(r'(\d+)', 'PR405j')
['PR', '405', 'j']

The trick here is using a capturing group (with parentheses) as the regular expression to split by; this will cause the output to contain the portions that caused the split as well as the portions to either side of it. If you have a string with multiple groups of digits separated by non-digits, this will fully split the string:

>>> re.split(r'(\d+)', 'PR405j123abc')
['PR', '405', 'j', '123', 'abc']

edited Nov 9, 2011 at 19:44

answered Nov 9, 2011 at 19:38

dcrosta

26.4k8 gold badges74 silver badges83 bronze badges

1 Comment

agf Over a year ago

That doesn't split PR405j into PR and 405j.

Colin Dunklau · Accepted Answer · 2011-11-09 19:47:27Z

0

re.split, like the rest of the answers. But you have to munge it to deal with the grouping:

import re
re.split(r'([a-zA-Z]+)', 'PR405j', 1)[1:]

answered Nov 9, 2011 at 19:47

Colin Dunklau

3,1111 gold badge24 silver badges20 bronze badges

Collectives™ on Stack Overflow

How to parse a string into two different strings based on first instance of an integer? (Python)

3 Answers 3

7 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related