2

I have a pandas dataframe in a transactional format:

id  purchased_item
1   apple
1   banana
1   carrot
2   banana
3   apple
4   apple
4   carrot
4   diet coke
5   banana
5   carrot
6   banana
6   carrot

I would like to convert this to the following:

[['apple', 'banana', 'carrot'],
 ['banana'],
 ['apple'],
 ['apple', 'carrot', 'diet coke'],
 ['banana', 'carrot'],
 ['banana', 'carrot']]

I have tried this:

df.groupby(['id'])['purchased_item'].apply(list)

The output looks like:

customer_id
1                 [apple, banana, carrot]
2                                [banana]
3                                 [apple]
4              [apple, carrot, diet coke]
5                        [banana, carrot]
6                        [banana, carrot]

What to do next? Or is there a different approach? Thanks a lot for help.

3
  • This question is answered here:stackoverflow.com/questions/34080979/… Commented Dec 4, 2015 at 6:20
  • To me, this is a different question because the original data are in a different format. So I am looking for a different solution. Commented Dec 4, 2015 at 6:23
  • I found a solution from here stackoverflow.com/questions/15112234/… Commented Dec 4, 2015 at 6:36

2 Answers 2

3

The solution which you mentioned in a comment from answer to question:

df.groupby(['id'])['purchased_item'].apply(list).values.tolist()

In [434]: df.groupby(['id'])['purchased_item'].apply(list).values.tolist()
Out[434]:
[['apple', 'banana', 'carrot'],
 ['banana'],
 ['apple'],
 ['apple', 'carrot', 'diet_coke'],
 ['banana', 'carrot'],
 ['banana', 'carrot']]

EDIT

Some test performance to compare with @Colonel Beauvel solution:

In [472]: %timeit [gr['purchased_item'].tolist() for n, gr in df.groupby('id')]
100 loops, best of 3: 2.1 ms per loop

In [473]: %timeit df.groupby(['id'])['purchased_item'].apply(list).values.tolist()
1000 loops, best of 3: 1.36 ms per loop
Sign up to request clarification or add additional context in comments.

Comments

1

I would rather employ a different solution using comprehension list:

[gr['purchased_item'].tolist() for n, gr in df.groupby('id')]

Out[9]:
[['apple', 'banana', 'carrot'],
 ['banana'],
 ['apple'],
 ['apple', 'carrot', 'dietcoke'],
 ['banana', 'carrot'],
 ['banana', 'carrot']]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.