how to do string operation on pandas dataframe

Question

I have a dataframe as below:

df = pd.DataFrame({'a': [10, 11, None],
                   'b': ['apple;', None, 'orange;'],
                   'c': ['red', 'blue', 'green']})

I'm trying to strip the ';' of those strings. I tried

df.select_dtypes(include=['object']).applymap(lambda x: x.strip(';'))

I got error message:

AttributeError: ("'NoneType' object has no attribute 'strip'", 'occurred at   index b')

Seems like the None gave me some trouble. Help is greatly appreciated. Thanks a lot.

Dekel · Accepted Answer · 2016-10-29 03:33:17Z

2

The problem is that some of the values are None, and you can't Non.strip().

df.select_dtypes(include=['object'])
         b      c
0   apple;    red
1     None   blue
2  orange;  green

What you can do is strip only if the object is not None, otherwise just return the object:

df.select_dtypes(include=['object']).applymap(lambda x: x.strip(';') if x else x)
        b      c
0   apple    red
1    None   blue
2  orange  green

answered Oct 29, 2016 at 3:33

Dekel

62.9k12 gold badges109 silver badges130 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

dawg Over a year ago

The problem with lambda x: x.strip(';') if x else x is that you will have attribute errors with other objects that are falsey in the Python sense and also lack a .strip method. You can do lambda x: x.strip(';') if hasattr(x, 'strip') else x if you want to do it by testing first.

Dekel Over a year ago

@dawg, correct. since the data in the example includes only strings the if x was enough. Another option is to check if type(x) == str, to make sure the strip will be only on strings (and not on other objects that might have the strip function, which we are not sure will return the expected result).

zesla Over a year ago

this is perfect. Thanks a lot!

zesla Over a year ago

@Dekel just did. just knew I need to do this.

dawg · Accepted Answer · 2016-10-29 03:46:27Z

1

You can use try and except in this case.

>>> def am(o):
...    try:
...       return o.strip(';')
...    except AttributeError:
...       return o

Then applymap as you have tried:

>>> df.select_dtypes(include=['object']).applymap(am)
        b      c
0   apple    red
1    None   blue
2  orange  green

edited Oct 29, 2016 at 3:46

answered Oct 29, 2016 at 3:29

dawg

105k24 gold badges142 silver badges217 bronze badges

Comments

juanpa.arrivillaga · Accepted Answer · 2016-10-29 03:31:50Z

0

Use the Series str attribute and apply instead of applymap:

In [17]: df.select_dtypes(include=['object']).apply(lambda S:S.str.strip(';'))
Out[17]: 
        b      c
0   apple    red
1    None   blue
2  orange  green

In [18]:

answered Oct 29, 2016 at 3:31

juanpa.arrivillaga

97.6k14 gold badges141 silver badges190 bronze badges

Comments

Carlos Pinela · Accepted Answer · 2017-01-11 13:22:29Z

0

A different approach is to iterate through all the columns that are dtype object and use the Series function strip that handles NaN values:

for col in df.columns[df.dtypes == object]:
    df[col] = df[col].str.strip(";")

answered Jan 11, 2017 at 13:22

Carlos Pinela

1

Collectives™ on Stack Overflow

how to do string operation on pandas dataframe

4 Answers 4

4 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related