92

I'd like to split strings like these

'foofo21'
'bar432'
'foobar12345'

into

['foofo', '21']
['bar', '432']
['foobar', '12345']

Does somebody know an easy and simple way to do this in python?

12 Answers 12

86

I would approach this by using re.match in the following way:

import re
match = re.match(r"([a-z]+)([0-9]+)", 'foofo21', re.I)
if match:
    items = match.groups()
print(items)
>> ("foofo", "21")
Sign up to request clarification or add additional context in comments.

9 Comments

you probably want \w instead of [a-z] and \d instead of [0-9]
@Dan: Using \w is a poor choice as it matches all alphanumeric characters, not just a-z. So, the entire string would be caught in the first group.
If that's a concern, you can tack '\b' (IIRC) at the end, to specify that the match must end at a word boundary (or '$' to match the end of the string).
How can this be extended to str-digit-str-digit such as p6max20 to get p=6, max=20? "( )( )( )( )" four grouping?
re.split('(\d+)', t)
|
70
def mysplit(s):
    head = s.rstrip('0123456789')
    tail = s[len(head):]
    return head, tail
>>> [mysplit(s) for s in ['foofo21', 'bar432', 'foobar12345']]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

7 Comments

Comparing timing for this answer to the accepted answer, on my machine, using a single example (case study, not representative of all uses), this str().rstrip() method was roughly 4x faster. Also, it does not require another import.
It is more pythonic.
don't know how relevant this is but, when I try FOO_BAR10.34 it gave me 'FOO_BAR10.' and '34' and then when I re-apply mysplits to the first element, it gives me the same thing. I know my issue is slightly different.
But I can slice 'FOO_BAR10.' to remove the '.', then re-apply the function to get what I want. +1.
To split 'float' at the end add '.' to digits in rstrip() call.
|
34

Yet Another Option:

>>> [re.split(r'(\d+)', s) for s in ('foofo21', 'bar432', 'foobar12345')]
[['foofo', '21', ''], ['bar', '432', ''], ['foobar', '12345', '']]

2 Comments

Neat. Or even: [re.split(r'(\d+)', s)[0:2] for s in ...] getting rid of that extra empty string. Note though that compared with \w this is equivalent to [^|\d].
@PEZ: There may be more than one pair and an empty string may be at the begining of the list. You could remove empty strings with [filter(None, re.split(r'(\d+)', s)) for s in ('foofo21','a1')]
29
>>> r = re.compile("([a-zA-Z]+)([0-9]+)")
>>> m = r.match("foobar12345")
>>> m.group(1)
'foobar'
>>> m.group(2)
'12345'

So, if you have a list of strings with that format:

import re
r = re.compile("([a-zA-Z]+)([0-9]+)")
strings = ['foofo21', 'bar432', 'foobar12345']
print [r.match(string).groups() for string in strings]

Output:

[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

Comments

12

I'm always the one to bring up findall() =)

>>> strings = ['foofo21', 'bar432', 'foobar12345']
>>> [re.findall(r'(\w+?)(\d+)', s)[0] for s in strings]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

Note that I'm using a simpler (less to type) regex than most of the previous answers.

3 Comments

r'\w' matches ''. I don't see '' in the question.
I don't see A-Z in the question. It says "text and numbers".
@PEZ: If you allow any text except numbers then your regexp should be r'(\D+)(\d+)'.
11

here is a simple function to seperate multiple words and numbers from a string of any length, the re method only seperates first two words and numbers. I think this will help everyone else in the future,

def seperate_string_number(string):
    previous_character = string[0]
    groups = []
    newword = string[0]
    for x, i in enumerate(string[1:]):
        if i.isalpha() and previous_character.isalpha():
            newword += i
        elif i.isnumeric() and previous_character.isnumeric():
            newword += i
        else:
            groups.append(newword)
            newword = i

        previous_character = i

        if x == len(string) - 2:
            groups.append(newword)
            newword = ''
    return groups

print(seperate_string_number('10in20ft10400bg'))
# outputs : ['10', 'in', '20', 'ft', '10400', 'bg'] 

Comments

5

without using regex, using isdigit() built-in function, only works if starting part is text and latter part is number

def text_num_split(item):
    for index, letter in enumerate(item, 0):
        if letter.isdigit():
            return [item[:index],item[index:]]

print(text_num_split("foobar12345"))

OUTPUT :

['foobar', '12345']

Comments

3
import re

s = raw_input()
m = re.match(r"([a-zA-Z]+)([0-9]+)",s)
print m.group(0)
print m.group(1)
print m.group(2)

Comments

0

This is a little longer, but more versatile for cases where there are multiple, randomly placed, numbers in the string. Also, it requires no imports.

def getNumbers( input ):
    # Collect Info
    compile = ""
    complete = []

    for letter in input:
        # If compiled string
        if compile:
            # If compiled and letter are same type, append letter
            if compile.isdigit() == letter.isdigit():
                compile += letter
            
            # If compiled and letter are different types, append compiled string, and begin with letter
            else:
                complete.append( compile )
                compile = letter
            
        # If no compiled string, begin with letter
        else:
            compile = letter
        
    # Append leftover compiled string
    if compile:
        complete.append( compile )
    
    # Return numbers only
    numbers = [ word for word in complete if word.isdigit() ]
        
    return numbers

Comments

0

Here is simple solution for that problem, no need for regex:

user = input('Input: ') # user = 'foobar12345'
int_list, str_list = [], []

for item in user:
 try:
    item = int(item)  # searching for integers in your string
  except:
    str_list.append(item)
    string = ''.join(str_list)
  else:  # if there are integers i will add it to int_list but as str, because join function only can work with str
    int_list.append(str(item))
    integer = int(''.join(int_list))  # if you want it to be string just do z = ''.join(int_list)

final = [string, integer]  # you can also add it to dictionary d = {string: integer}
print(final)

1 Comment

are you sure this is correct item = int(item) # searching for integers in your string!!!!???
0

In Addition to the answer of @Evan If the incoming string is in this pattern 21foofo then the re.match pattern would be like this.

import re
match = re.match(r"([0-9]+)([a-z]+)", '21foofo', re.I)
if match:
    items = match.groups()
print(items)
>> ("21", "foofo")

Otherwise, you'll get UnboundLocalError: local variable 'items' referenced before assignment error.

Comments

0

According to your use case, you might have luck with the re.split() method:

re.split()

With that I could extract the "average" duration of a dataset like

< 1 minute
240 - 480 minutes
120 - 240 minutes
60 - 120 minutes
50 - 60 minutes
import re

def avg_duration(s):
    # duration as a human string, converted to a float
    values = [float(x) for x in re.split(r"\D+", s) if len(x) > 0]
    # Return average
    return sum(values) / len(values)

Not that I needed a if len(x) > 0 clause for this, as there are many "empty" groups with this approach but otherwise it worked very well.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.