How to split strings into text and number?

Question

I'd like to split strings like these

'foofo21'
'bar432'
'foobar12345'

into

['foofo', '21']
['bar', '432']
['foobar', '12345']

Does somebody know an easy and simple way to do this in python?

Ehsan Tabatabaei · Accepted Answer · 2020-08-01 19:31:53Z

86

I would approach this by using re.match in the following way:

import re
match = re.match(r"([a-z]+)([0-9]+)", 'foofo21', re.I)
if match:
    items = match.groups()
print(items)
>> ("foofo", "21")

edited Aug 1, 2020 at 19:31

Ehsan Tabatabaei

1531 silver badge7 bronze badges

answered Jan 9, 2009 at 23:12

Evan Fosmark

102k36 gold badges109 silver badges118 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Dan Over a year ago

you probably want \w instead of [a-z] and \d instead of [0-9]

Evan Fosmark Over a year ago

@Dan: Using \w is a poor choice as it matches all alphanumeric characters, not just a-z. So, the entire string would be caught in the first group.

Jeff Shannon Over a year ago

If that's a concern, you can tack '\b' (IIRC) at the end, to specify that the match must end at a word boundary (or '$' to match the end of the string).

Joonho Park Over a year ago

How can this be extended to str-digit-str-digit such as p6max20 to get p=6, max=20? "( )( )( )( )" four grouping?

Bera Over a year ago

re.split('(\d+)', t)

|

Giorgos Xou · Accepted Answer · 2022-08-11 11:47:50Z

70

def mysplit(s):
    head = s.rstrip('0123456789')
    tail = s[len(head):]
    return head, tail

>>> [mysplit(s) for s in ['foofo21', 'bar432', 'foobar12345']]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

edited Aug 11, 2022 at 11:47

Giorgos Xou

2,3621 gold badge23 silver badges42 bronze badges

answered Jan 10, 2009 at 6:17

Mike

1,1646 silver badges7 bronze badges

7 Comments

Steven C. Howell Over a year ago

Comparing timing for this answer to the accepted answer, on my machine, using a single example (case study, not representative of all uses), this str().rstrip() method was roughly 4x faster. Also, it does not require another import.

cardamom Over a year ago

It is more pythonic.

Jack Armstrong Over a year ago

don't know how relevant this is but, when I try FOO_BAR10.34 it gave me 'FOO_BAR10.' and '34' and then when I re-apply mysplits to the first element, it gives me the same thing. I know my issue is slightly different.

Jack Armstrong Over a year ago

But I can slice 'FOO_BAR10.' to remove the '.', then re-apply the function to get what I want. +1.

Mike Over a year ago

To split 'float' at the end add '.' to digits in rstrip() call.

|

jfs · Accepted Answer · 2009-01-10 00:54:09Z

34

Yet Another Option:

>>> [re.split(r'(\d+)', s) for s in ('foofo21', 'bar432', 'foobar12345')]
[['foofo', '21', ''], ['bar', '432', ''], ['foobar', '12345', '']]

answered Jan 10, 2009 at 0:54

jfs

417k210 gold badges1k silver badges1.7k bronze badges

2 Comments

PEZ Over a year ago

Neat. Or even: [re.split(r'(\d+)', s)[0:2] for s in ...] getting rid of that extra empty string. Note though that compared with \w this is equivalent to [^|\d].

jfs Over a year ago

@PEZ: There may be more than one pair and an empty string may be at the begining of the list. You could remove empty strings with [filter(None, re.split(r'(\d+)', s)) for s in ('foofo21','a1')]

Federico A. Ramponi · Accepted Answer · 2009-01-09 23:19:24Z

29

>>> r = re.compile("([a-zA-Z]+)([0-9]+)")
>>> m = r.match("foobar12345")
>>> m.group(1)
'foobar'
>>> m.group(2)
'12345'

So, if you have a list of strings with that format:

import re
r = re.compile("([a-zA-Z]+)([0-9]+)")
strings = ['foofo21', 'bar432', 'foobar12345']
print [r.match(string).groups() for string in strings]

Output:

[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

edited Jan 9, 2009 at 23:19

answered Jan 9, 2009 at 23:12

Federico A. Ramponi

47.2k31 gold badges113 silver badges134 bronze badges

Comments

PEZ · Accepted Answer · 2009-01-09 23:49:55Z

12

I'm always the one to bring up findall() =)

>>> strings = ['foofo21', 'bar432', 'foobar12345']
>>> [re.findall(r'(\w+?)(\d+)', s)[0] for s in strings]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

Note that I'm using a simpler (less to type) regex than most of the previous answers.

edited Jan 9, 2009 at 23:49

answered Jan 9, 2009 at 23:40

PEZ

17k7 gold badges47 silver badges66 bronze badges

3 Comments

jfs Over a year ago

r'\w' matches ''. I don't see '' in the question.

PEZ Over a year ago

I don't see A-Z in the question. It says "text and numbers".

jfs Over a year ago

@PEZ: If you allow any text except numbers then your regexp should be r'(\D+)(\d+)'.

Nikaido · Accepted Answer · 2019-08-05 13:48:05Z

here is a simple function to seperate multiple words and numbers from a string of any length, the re method only seperates first two words and numbers. I think this will help everyone else in the future,

def seperate_string_number(string):
    previous_character = string[0]
    groups = []
    newword = string[0]
    for x, i in enumerate(string[1:]):
        if i.isalpha() and previous_character.isalpha():
            newword += i
        elif i.isnumeric() and previous_character.isnumeric():
            newword += i
        else:
            groups.append(newword)
            newword = i

        previous_character = i

        if x == len(string) - 2:
            groups.append(newword)
            newword = ''
    return groups

print(seperate_string_number('10in20ft10400bg'))
# outputs : ['10', 'in', '20', 'ft', '10400', 'bg']

roshandev · Accepted Answer · 2019-04-25 06:27:21Z

5

without using regex, using isdigit() built-in function, only works if starting part is text and latter part is number

def text_num_split(item):
    for index, letter in enumerate(item, 0):
        if letter.isdigit():
            return [item[:index],item[index:]]

print(text_num_split("foobar12345"))

OUTPUT :

['foobar', '12345']

answered Apr 25, 2019 at 6:27

roshandev

591 silver badge1 bronze badge

Comments

Bug Hunter 219 · Accepted Answer · 2015-11-19 19:28:53Z

3

import re

s = raw_input()
m = re.match(r"([a-zA-Z]+)([0-9]+)",s)
print m.group(0)
print m.group(1)
print m.group(2)

answered Nov 19, 2015 at 19:28

Bug Hunter 219

3442 silver badges16 bronze badges

Comments

Amr Sharaki · Accepted Answer · 2020-10-08 12:38:35Z

This is a little longer, but more versatile for cases where there are multiple, randomly placed, numbers in the string. Also, it requires no imports.

def getNumbers( input ):
    # Collect Info
    compile = ""
    complete = []

    for letter in input:
        # If compiled string
        if compile:
            # If compiled and letter are same type, append letter
            if compile.isdigit() == letter.isdigit():
                compile += letter
            
            # If compiled and letter are different types, append compiled string, and begin with letter
            else:
                complete.append( compile )
                compile = letter
            
        # If no compiled string, begin with letter
        else:
            compile = letter
        
    # Append leftover compiled string
    if compile:
        complete.append( compile )
    
    # Return numbers only
    numbers = [ word for word in complete if word.isdigit() ]
        
    return numbers

Henry Ecker · Accepted Answer · 2021-07-28 21:58:01Z

0

Here is simple solution for that problem, no need for regex:

user = input('Input: ') # user = 'foobar12345'
int_list, str_list = [], []

for item in user:
 try:
    item = int(item)  # searching for integers in your string
  except:
    str_list.append(item)
    string = ''.join(str_list)
  else:  # if there are integers i will add it to int_list but as str, because join function only can work with str
    int_list.append(str(item))
    integer = int(''.join(int_list))  # if you want it to be string just do z = ''.join(int_list)

final = [string, integer]  # you can also add it to dictionary d = {string: integer}
print(final)

edited Jul 28, 2021 at 21:58

Henry Ecker♦

35.8k19 gold badges48 silver badges67 bronze badges

answered Sep 22, 2019 at 18:27

Zun Gamer

1

1 Comment

Abu Shumon Over a year ago

are you sure this is correct item = int(item) # searching for integers in your string!!!!???

Uniquedesign · Accepted Answer · 2022-05-18 18:42:33Z

0

In Addition to the answer of @Evan If the incoming string is in this pattern 21foofo then the re.match pattern would be like this.

import re
match = re.match(r"([0-9]+)([a-z]+)", '21foofo', re.I)
if match:
    items = match.groups()
print(items)
>> ("21", "foofo")

Otherwise, you'll get UnboundLocalError: local variable 'items' referenced before assignment error.

answered May 18, 2022 at 18:42

Uniquedesign

4622 gold badges9 silver badges27 bronze badges

Comments

maxauthority · Accepted Answer · 2023-12-04 11:29:21Z

0

According to your use case, you might have luck with the re.split() method:

re.split()

With that I could extract the "average" duration of a dataset like

< 1 minute
240 - 480 minutes
120 - 240 minutes
60 - 120 minutes
50 - 60 minutes

import re

def avg_duration(s):
    # duration as a human string, converted to a float
    values = [float(x) for x in re.split(r"\D+", s) if len(x) > 0]
    # Return average
    return sum(values) / len(values)

Not that I needed a if len(x) > 0 clause for this, as there are many "empty" groups with this approach but otherwise it worked very well.

answered Dec 4, 2023 at 11:29

maxauthority

1413 bronze badges

Collectives™ on Stack Overflow

How to split strings into text and number?

12 Answers 12

9 Comments

7 Comments

2 Comments

Comments

3 Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

9 Comments

7 Comments

2 Comments

Comments

3 Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related