Extract int from string in Pandas

Question

Lets say I have a dataframe df as

A B
1 V2
3 W42
1 S03
2 T02
3 U71

I want to have a new column (either at it the end of df or replace column B with it, as it doesn't matter) that only extracts the int from the column B. That is I want column C to look like

So if there is a 0 in front of the number, such as for 03, then I want to return 3 not 03

How can I do this?

Lokesh A. R. · Accepted Answer · 2016-02-13 05:37:48Z

104

You can convert to string and extract the integer using regular expressions.

df['B'].str.extract('(\d+)').astype(int)

edited Feb 13, 2016 at 5:37

answered Feb 13, 2016 at 5:29

Lokesh A. R.

2,3661 gold badge25 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

s3dev Over a year ago

Nice one! For future readers: This can also be done with a compiled regex for more complicated expressions. Simple example: exp = re.compile('\d+'). Then use exp in the str.extract(exp) call.

Mike Graham · Accepted Answer · 2016-02-13 05:33:42Z

3

Assuming there is always exactly one leading letter

df['B'] = df['B'].str[1:].astype(int)

answered Feb 13, 2016 at 5:33

Mike Graham

77.2k16 gold badges105 silver badges131 bronze badges

Comments

Paul Brennan · Accepted Answer · 2021-04-06 20:11:43Z

1

First set up the data

df = pd.DataFrame({'A': [1, 3, 1, 2, 3], 'B' : ['V2', 'W42', 'S03', 'T02', 'U71']})

df.head()

Then do the extraction and cast it back to ints

df['C'] = df['B'].str.extract('(\d+)').astype(int)

df.head()

edited Apr 6, 2021 at 20:11

Paul Brennan

2,7364 gold badges23 silver badges27 bronze badges

answered Apr 6, 2021 at 20:07

Tamer Ragaee

111 bronze badge

Comments

boesjes · Accepted Answer · 2017-05-18 08:19:29Z

0

I wrote a little loop to do this , as I didn't have my strings in a DataFrame, but in a list. This way, you can also add a little if statement to account for floats :

output= ''
input = 'whatever.007'  

for letter in input :
        try :
            int(letter)
            output += letter

        except ValueError :
                pass

        if letter == '.' :
            output += letter

output = float(output)

or you can int(output) if you like.

answered May 18, 2017 at 8:19

boesjes

1

Comments

Kohn1001 · Accepted Answer · 2018-12-01 11:45:15Z

0

Preparing the DF to have the same one as yours:

df = pd.DataFrame({'A': [1, 3, 1, 2, 3], 'B' : ['V2', 'W42', 'S03', 'T02', 'U71']})

df.head()

Now Manipulate it to get your desired outcome:

df['C'] = df['B'].apply(lambda x: re.search(r'\d+', x).group())

df.head()


    A   B   C
0   1   V2  2
1   3   W42 42
2   1   S03 03
3   2   T02 02
4   3   U71 71

edited Dec 1, 2018 at 11:45

answered Dec 1, 2018 at 11:07

Kohn1001

3,9511 gold badge27 silver badges27 bronze badges

Comments

Ahmed Elsayed · Accepted Answer · 2020-12-12 17:55:15Z

0

This is another way of doing it if you don't want to use regualr expressions: I used map() function to apply what is needed on each element of the column. So like this:

letters = "abcdefghijklmnopqrstuvwxyz"
df['C'] = list(map(lambda x: int(x.lower().strip(letters))   ,  df['B']))

Output will be like this:

answered Dec 12, 2020 at 17:55

Ahmed Elsayed

551 silver badge5 bronze badges

Comments

Tomerikoo · Accepted Answer · 2021-09-01 15:07:35Z

0

I Used apply and it works just fine too:

df = pd.DataFrame({'A': [1, 3, 1, 2, 3], 'B' : ['V2', 'W42', 'S03', 'T02', 'U71']})
df['C'] = df['B'].apply(lambda x: int(x[1:]))
df['C']

Output:

0     2
1    42
2     3
3     2
4    71
Name: C, dtype: int64

edited Sep 1, 2021 at 15:07

Tomerikoo

19.6k16 gold badges57 silver badges68 bronze badges

answered Sep 1, 2021 at 14:47

Hazem Mohamed

11 bronze badge

Comments

Adegite Taiwo · Accepted Answer · 2022-06-02 22:33:11Z

0

That's correct, just as @Lokesh A. R. has answered above, but this won't work in all cases. When you get the error pattern contains no capture groups this is what you should do. According to the docs you to add parentheses to specify capture group.

df["B"].str.extract('(\d+)')

answered Jun 2, 2022 at 22:33

Adegite Taiwo

394 bronze badges

Collectives™ on Stack Overflow

Extract int from string in Pandas

8 Answers 8

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related