3

I want to display the users that have used a value.

import pandas as pd
user = ['alice', 'bob', 'tim', 'alice']
val = [['a','b','c'],['a'],['c','d'],['a','d']]
df = pd.DataFrame({'user': user, 'val': val})

user    val
'alice'      [a, b, c]
'bob'        [a]
'tim'        [c, d]
'alice'      [a, d]

Desired output:

val     users
a      [alice,bob]
b      [alice]
c      [alice,tim]
d      [alice,tim]

Any ideas?

4 Answers 4

4

Step 1
Reshape your data -

from itertools import chain

df = pd.DataFrame({
    'val' : list(chain.from_iterable(df.val.tolist())), 
    'user' : df.user.repeat(df.val.str.len())
})

Step 2
Use groupby + apply + unique:

df.groupby('val').user.apply(lambda x: x.unique().tolist())

val
a    [alice, bob]
b         [alice]
c    [alice, tim]
d    [tim, alice]
Name: user, dtype: object
Sign up to request clarification or add additional context in comments.

3 Comments

It is not the same as OP's desired output.
Shouldn't row 'c' and 'd' be [1, 3] (User 1 and 3 have values 'c' and 'd'), but your codes give [1, 1]?
I want to show the actual users. One second let me update my output. The users as numbers are confusing, that's my fault.
1

This is my approach.

df2 = (df
       .set_index('user')
       .val
       .apply(pd.Series)
       .stack()
       .reset_index(name='val')  # Reshape the data
       .groupby(['val'])
       .user
       .apply(lambda x: sorted(set(x))))  # Show users that use the value

Output:

print(df2)
# val
# a    [alice, bob]
# b         [alice]
# c    [alice, tim]
# d    [alice, tim]
# Name: user, dtype: object

3 Comments

@qrs If performance is important, you may want to take another look at the other answers
@cᴏʟᴅsᴘᴇᴇᴅ Would you mind telling us why your code is faster?
Sure, no worries. apply(pd.Series) is generally considered very slow. I learned this the hard way :)
1

I think need:

df2 = (pd.DataFrame(df['val'].values.tolist(), index=df['user'].values)
         .stack()
         .reset_index(name='val')
         .groupby('val')['level_0']
         .unique()
         .reset_index()
         .rename(columns={'level_0':'user'})
     )
print(df2)
  val          user
0   a  [alice, bob]
1   b       [alice]
2   c  [alice, tim]
3   d  [tim, alice]

Comments

0

Don't have enough reputation to write this as a comment, but this question has the answer: How to print dataframe without index

basically, change the last line to:

print(df2.to_string(index=False))

1 Comment

No, that is isn't it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.