Pandas filter list by using unique python

Question

I have a dataframe similar to below

df = pd.DataFrame.from_dict({'cat1':['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D'], 'cat2':[['X','Y'], ['F'], ['X','Y'], ['Y'], ['Y'], ['Y'], ['Z'], ['P','W'],['L','K'],['L','K'],['L','K']]})

The output is

   cat1    cat2
0     A  [X, Y]
1     A     [F]
2     A  [X, Y]
3     B     [Y]
4     B     [Y]
5     C     [Y]
6     C     [Z]
7     C  [P, W]
8     D  [L, K]
9     D  [L, K]
10    D  [L, K]

I would like to filter out B and D, B and D only has 'Y' and ['L','K'].

Desired output:

   cat1    cat2
0     A  [X, Y]
1     A     [F]
2     A  [X, Y]
3     C     [Y]
4     C     [Z]
5     C  [P, W]

I have tried df.groupby(['cat1'])['cat2'].unique()yet, as it is a list column. It will not work.

Thank you in advance

Just to be clear, you only want A and C rows of Cat1 right?

Mohit Motwani
– Mohit Motwani

2019-07-12 06:39:13 +00:00
Commented Jul 12, 2019 at 6:39 — Mohit Motwani
– Mohit Motwani, Commented Jul 12, 2019 at 6:39

jezrael · Accepted Answer · 2019-07-12 06:46:53Z

2

In python lists are not hashtable, so necessary convert them to tuples or strings, then use GroupBy.transform with SeriesGroupBy.nunique and filter by not equal with Series.ne and boolean indexing:

df = df[df['cat2'].apply(tuple).groupby(df['cat1']).transform('nunique').ne(1)]
#alternative
#df = df[df['cat2'].astype('str').groupby(df['cat1']).transform('nunique').ne(1)]
print (df)
  cat1    cat2
0    A  [X, Y]
1    A     [F]
2    A  [X, Y]
5    C     [Y]
6    C     [Z]
7    C  [P, W]

edited Jul 12, 2019 at 6:46

answered Jul 12, 2019 at 6:40

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas filter list by using unique python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related