2

Let's say I have the following data

import pandas as pd
df = pd.DataFrame(data=[[1, 'a'], [1, 'aaa'], [1, 'aa'], 
                        [2, 'bb'], [2, 'bbb'], 
                        [3, 'cc']], 
                  columns=['key', 'text'])

   key text
0    1    a
1    1  aaa
2    1   aa
3    2   bb
4    2  bbb
5    3   cc

What I would like to do is group by the key variable and sort the data within each group by the length of text and end up with a single Series of index values to use to reindex the dataframe. I thought I could just do something like this:

df.groupby('key').text.str.len().sort_values(ascending=False).index

But it said I need to use apply, so I tried this:

df.groupby('key').apply(lambda x: x.text.str.len().sort_values(ascending=False).index, axis=1)

But that told me that lambda got an unexpected keyword: axis.

I'm relatively new to pandas, so I'm not sure how to go about this. Also, my goal is to simply deduplicate the data such that for each key, I keep the value with the longest value of text. The expected output is:

   key text
1    1  aaa
4    2  bbb
5    3   cc

If there's an easier way to do this than what I'm attempting, I'm open to that as well.

3 Answers 3

5

No need for the intermediate step. You can get a series with the string lengths like this:

df['text'].str.len()

Now juut groupby key, and return the value indexed where the length of the string is largest using idxmax()

In [33]: df.groupby('key').agg(lambda x: x.loc[x.str.len().idxmax()])
Out[33]:
    text
key
1    aaa
2    bbb
3     cc
Sign up to request clarification or add additional context in comments.

Comments

3
df.groupby('key', as_index=False).apply(lambda x: x[x.text.str.len() == x.text.str.len().max()])

Output:

     key text
0 1    1  aaa
1 4    2  bbb
2 5    3   cc

Comments

1
def get_longest_string(row):
    return [x for x in row.tolist() if len(x) == max([len(x) for x in row.tolist()])]

res = df.groupby('key')['text'].apply(get_longest_string).reset_index()

Output:

   key   text
0    1  [aaa]
1    2  [bbb]
2    3   [cc]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.