How to apply string methods to multiple columns of a dataframe

Question

I have a dataframe with multiple string columns. I want to use a string method that is valid for a series on multiple columns of the dataframe. Something like this is what I would wish for:

df = pd.DataFrame({'A': ['123f', '456f'], 'B': ['789f', '901f']})
df

Out[15]: 
      A     B
0  123f  789f
1  456f  901f

df = df.str.rstrip('f')
df
Out[16]: 
     A    B
0  123  789
1  456  901

Obviously, this doesn't work because str operations are only valid on pandas Series objects. What is the appropriate/most pandas-y method to do this?

jezrael · Accepted Answer · 2018-08-30 14:12:56Z

22

Function rstrip working with Series so is possible use apply:

df = df.apply(lambda x: x.str.rstrip('f'))

Or create Series by stack and last unstack:

df = df.stack().str.rstrip('f').unstack()

Or use applymap:

df = df.applymap(lambda x: x.rstrip('f'))

Last if need apply function to some columns:

#add columns to lists
cols = ['A']
df[cols] = df[cols].apply(lambda x: x.str.rstrip('f'))
df[cols] = df[cols].stack().str.rstrip('f').unstack()
df[cols] = df[cols].stack().str.rstrip('f').unstack()

edited Aug 30, 2018 at 14:12

answered Aug 30, 2018 at 14:07

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user3483203 · Accepted Answer · 2018-08-30 14:19:03Z

4

You can mimic the behavior of rstrip using replace with regex=True, which can be applied to the entire DataFrame:

df.replace(r'f$', '', regex=True)

     A    B
0  123  789
1  456  901

Since rstrip takes a sequence of characters to strip, you can easily extend this:

df.replace(r'[abc]+$', '', regex=True)

edited Aug 30, 2018 at 14:19

answered Aug 30, 2018 at 14:13

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Comments

jpp · Accepted Answer · 2018-08-30 14:24:23Z

You can use a dictionary comprehension and feed to the pd.DataFrame constructor:

res = pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})

Currently, the Pandas str methods are inefficient. Regex is even more inefficient, but more easily extendible. As always, you should test with your data.

# Benchmarking on Python 3.6.0, Pandas 0.19.2

def jez1(df):
    return df.apply(lambda x: x.str.rstrip('f'))

def jez2(df):
    return df.applymap(lambda x: x.rstrip('f'))

def jpp(df):
    return pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})

def user3483203(df):
    return df.replace(r'f$', '', regex=True)

df = pd.concat([df]*10000)

%timeit jez1(df)         # 33.1 ms per loop
%timeit jez2(df)         # 29.9 ms per loop
%timeit jpp(df)          # 13.2 ms per loop
%timeit user3483203(df)  # 42.9 ms per loop

Collectives™ on Stack Overflow

How to apply string methods to multiple columns of a dataframe

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related