2

Here is a simple pandas Dataframe defined as follow:

df = pd.DataFrame( {
    'word':     ['flower', 'mountain', 'ocean', 'universe'],
    'k':        [1, 2, 3, 4]
} )

>>> df
   k      word
0  1    flower
1  2  mountain
2  3     ocean
3  4  universe

I want to change df into this ( replace every word with its first k letters )

>>> df
   k  word
0  1     f
1  2    mo
2  3   oce
3  4  univ

I have an idea to achieve this by using pandas.Series.apply with a custom function

def get_first_k_letters( x, k ):
    return x[:k]

df['word'] = df['word'].apply( get_first_k_letters, args=(3,) )

>>> df
   k word
0  1  flo
1  2  mou
2  3  oce
3  4  uni

I can easily replace every word with its first 3 letters by setting args=(3,).

But I want to replace every word with its first k letters ( k is not always the same ) and I don't know what is the setting for args in this case.

Could somebody help me? Thanks! ( Other methods without using pandas.Series.apply will also be OK! )

2 Answers 2

2

I'd consider this approach:

In [121]: df['word'] = [w[1][:w[0]] for w in df.values]

In [122]: df
Out[122]:
   k  word
0  1     f
1  2    mo
2  3   oce
3  4  univ

Timing: for 40.000 rows DF:

In [123]: df = pd.concat([df] * 10**4, ignore_index=True)

In [124]: df.shape
Out[124]: (40000, 2)

In [125]: %timeit df.apply(lambda x: get_first_k_letters(x['word'], x['k']), axis=1)
1 loop, best of 3: 4.04 s per loop

In [126]: %timeit [w[1][:w[0]] for w in df.values]
10 loops, best of 3: 52.5 ms per loop

In [127]: 4.04 * 1000 / 52.5
Out[127]: 76.95238095238095
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! You are a python specialist!
2

You can do:

df.apply(lambda x: get_first_k_letters(x['word'], x['k']), axis=1)

Doing the apply with axis=1 option, yields each row into x (of the lambda. Giving axis=0 gives columns, not rows). Giving x['word'] and x['k'] to your function yields the correct outcome:

0       f
1      mo
2     oce
3    univ
dtype: object

2 Comments

Nice solution! Thank you very much!
@MaxU hehe, no problem. Your solution is indeed much faster for this specific case, but will not always work :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.