0

I have a table like the following:

  Celebrity  Usernames
0         A          2
1         A          1
2         B          3
3         C          2
4         D          2
5         A          3

I want to find out how many users followed both A, C, D. So, the output should be 1. How do I do this using python?

1
  • Have you heard of dictionaries? Commented Feb 18, 2021 at 22:26

2 Answers 2

1

Here is a way using groupby() and nunique():

l = ['A','C','D']
df.loc[df['Celebrity'].isin(l)].groupby('Usernames')['Celebrity'].nunique().eq(len(l))

Here is another way:

df.groupby(['Usernames','Celebrity']).size().loc[(slice(None),l)].unstack().gt(0)

And an alternative to crosstab:

df['Celebrity'].str.get_dummies().groupby(df['Usernames']).sum().loc[:,l].astype(bool).all(axis=1)

Using map:

df.loc[df['Usernames'].map(df.groupby('Usernames')['Celebrity'].agg(set).ge(set(l)))]
Sign up to request clarification or add additional context in comments.

4 Comments

The last one returned a different value from the previous three solutions. Can you figure out why is that? Just curious. The first one already fulfilled my goal well.
Do you mean the last solution didnt return just username 2 as containing all celebrities, or the format looked different than the others?
The last solution somehow did not return the correct answer. It always returns a value that is less than what the previous three solutions return. For example, I had an adjacency matrix that shows that A and B have 27 users in common. And the first three solutions return 27. The last one gives me 17. I was not able to debug. And if I go to check the username lists, both answers have some users in common, and some users not.
I made a slight edit... changed .eq() to .ge()
0

Take a crosstab, then subset to your columns and leverage the fact that bool(0) == False and bool(any_other_number) == True to see how many Usernames satisfy your condition.

(pd.crosstab(df['Usernames'], df['Celebrity'])
   .loc[:, ['A', 'C', 'D']]
   .astype(bool)
   .all(axis=1)
   .sum())
#1

The crosstab creates a table of counts:

pd.crosstab(df['Usernames'], df['Celebrity'])
#Celebrity  A  B  C  D
#Usernames            
#1          1  0  0  0
#2          1  0  1  1
#3          1  1  0  0

which we then susbet and turn into a Truth Table

pd.crosstab(df['Usernames'], df['Celebrity']).loc[:, ['A', 'C', 'D']].astype(bool)
#Celebrity     A      C      D
#Usernames                    
#1          True  False  False
#2          True   True   True
#3          True  False  False

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.