pandas dataframe substring df['column1'].str[:'column2']

Question

I have a dataframe (df) with columns (A=object, B=int64) what I need is to be able to get a substring of 'A' based on the value of 'B'.

I want to get 'C' like this:

  A      B    C
=====  =====  =========================
Jimmy  4      Jimm
Tommy  2      To
Karl   3      Kar
Jane   1      J
=====  =====  =========================

So far I tried this:

df['C'] = df['A'].str[:df['B']]

I also tried this:

l = (lambda x,y: str(x)[:y])

df[['A','B']].apply(l)

no luck.

EdChum · Accepted Answer · 2015-01-25 21:34:05Z

2

The following works but it won't be fast as it's operating as a loop over every row, the key thing here is to pass param axis=1 to operate row-wise and we can than access each column's value:

In [46]:

df['C'] = df.apply(lambda x: x['A'][:x['B']], axis=1)
df
Out[46]:
       A  B     C
0  Jimmy  4  Jimm
1  Tommy  2    To
2   Karl  3   Kar
3   Jane  1     J

So just to look at your attempts and why they don't work: df['C'] = df['A'].str[:df['B']] this will fail as you are trying to subscript every element in column A by passing a series, it has to be some constant int value unfortunately, it's a nice idea but won't work.

l = (lambda x,y: str(x)[:y])
df[['A','B']].apply(l)

This won't work because the result of df[['A', 'B']] is a just your original df, you've not specified the axis to operate on so the default is 0 which is column wise, in effect your lambda now fails as only a single param is passed which on the first iteration will be df['A'], so the only way to make this work is to operate row-wise by passing param axis=1. I can't currently think of a better way here at the moment.

edited Jan 25, 2015 at 21:34

answered Jan 25, 2015 at 21:27

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

geekjimbo Over a year ago

EdChum, txs v-much. it works. the axis=1 trick made it work. txs for the explanation also.

Collectives™ on Stack Overflow

pandas dataframe substring df['column1'].str[:'column2']

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related