24

I have a dataframe with multiple string columns. I want to use a string method that is valid for a series on multiple columns of the dataframe. Something like this is what I would wish for:

df = pd.DataFrame({'A': ['123f', '456f'], 'B': ['789f', '901f']})
df

Out[15]: 
      A     B
0  123f  789f
1  456f  901f

df = df.str.rstrip('f')
df
Out[16]: 
     A    B
0  123  789
1  456  901

Obviously, this doesn't work because str operations are only valid on pandas Series objects. What is the appropriate/most pandas-y method to do this?

3 Answers 3

22

Function rstrip working with Series so is possible use apply:

df = df.apply(lambda x: x.str.rstrip('f'))

Or create Series by stack and last unstack:

df = df.stack().str.rstrip('f').unstack()

Or use applymap:

df = df.applymap(lambda x: x.rstrip('f'))

Last if need apply function to some columns:

#add columns to lists
cols = ['A']
df[cols] = df[cols].apply(lambda x: x.str.rstrip('f'))
df[cols] = df[cols].stack().str.rstrip('f').unstack()
df[cols] = df[cols].stack().str.rstrip('f').unstack()
Sign up to request clarification or add additional context in comments.

Comments

4

You can mimic the behavior of rstrip using replace with regex=True, which can be applied to the entire DataFrame:

df.replace(r'f$', '', regex=True)

     A    B
0  123  789
1  456  901

Since rstrip takes a sequence of characters to strip, you can easily extend this:

df.replace(r'[abc]+$', '', regex=True)

Comments

1

You can use a dictionary comprehension and feed to the pd.DataFrame constructor:

res = pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})

Currently, the Pandas str methods are inefficient. Regex is even more inefficient, but more easily extendible. As always, you should test with your data.

# Benchmarking on Python 3.6.0, Pandas 0.19.2

def jez1(df):
    return df.apply(lambda x: x.str.rstrip('f'))

def jez2(df):
    return df.applymap(lambda x: x.rstrip('f'))

def jpp(df):
    return pd.DataFrame({col: [x.rstrip('f') for x in df[col]] for col in df})

def user3483203(df):
    return df.replace(r'f$', '', regex=True)

df = pd.concat([df]*10000)

%timeit jez1(df)         # 33.1 ms per loop
%timeit jez2(df)         # 29.9 ms per loop
%timeit jpp(df)          # 13.2 ms per loop
%timeit user3483203(df)  # 42.9 ms per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.