0

I'm trying to take a string like "PR405j" and separate it into two strings. In this instance, the two strings would be "PR" and "405j." There are a variety of strings I have to do this to. Exmaples: "ACR498" would be "ACR" and "498", "FR707e" would be "FR" and "707e", "TY699l" would be "TY" and "699l" and so on and so forth.

The problem I'm having is separating the first part from the second part. The amount of characters on either side differs, and the second string (the one with the numbers) may or may not have alphabetic characters in there as well. The only commonality between all of these strings is that you can divide them based on the first instance of an integer.

I thought a for loop that goes through every character in the original string and builds two separate strings inside would work, but I could only think to base the separation on integers and alphabetic characters, which would make something like "PR405j" turn into "PRj" and "405".

I also thought the split string method would help, but there's no one character all these strings have in common.

Finally, I can't split the strings based on the numbers of alphabetic characters in the beginning of the string (say 2 for "PR405j") because there is variation between strings.

If anybody could help me with this, I'd greatly appreciate it. Thank you!

2
  • 1
    The alternative to re would be ''.join(itertools.takewhile(operator.methodcaller('isalpha'), thestring)), ''.join(itertools.dropwhile(operator.methodcaller('isalpha'), thestring)), but don't use that. Commented Nov 9, 2011 at 19:50
  • And what do you want to have happen if (1) the string doesn't start with any alphabetics (2) the alphas are not followed by any numerics? Commented Nov 9, 2011 at 20:49

3 Answers 3

5

You can use regular expressions to do simple string matching such as this. The expression '(\D+)(.+)' is saying 'Extract one or more non-digits as the first group, then extract one or more other characters as the second.'

import re

inputs = ['PR405j']

for input in inputs:
    match = re.match('(\D+)(.+)', input)

    start = match.group(1)
    end = match.group(2)

    print input, start, end
Sign up to request clarification or add additional context in comments.

7 Comments

You could probably get away with (.+) instead of (\d.+)
The one liner would be start, end = re.match(r'(\D+)(.+)', input).groups().
@F.J: Bad luck if there is no match. One-liners don't get much use in the real world.
@ZackBloom: OP wants to start with alphabetics, not non-digits.
@JohnMachin As he is unfamiliar with regex, I went with the simplest example which would work. But, if necessary, the more precise expression might be ^([A-Z]{2,})(\d[a-zA-Z0-9]+)$.
|
0

EDIT: I misunderstood the question, thought you wanted 3 groups, not two. Zack Bloom's answer is more correct, but I'll leave this here as a reference in case someone has a similar question.


You can use re.split:

>>> re.split(r'(\d+)', 'PR405j')
['PR', '405', 'j']

The trick here is using a capturing group (with parentheses) as the regular expression to split by; this will cause the output to contain the portions that caused the split as well as the portions to either side of it. If you have a string with multiple groups of digits separated by non-digits, this will fully split the string:

>>> re.split(r'(\d+)', 'PR405j123abc')
['PR', '405', 'j', '123', 'abc']

1 Comment

That doesn't split PR405j into PR and 405j.
0

re.split, like the rest of the answers. But you have to munge it to deal with the grouping:

import re
re.split(r'([a-zA-Z]+)', 'PR405j', 1)[1:]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.