1

I need to filter my groups to show only the groups that contain a string in all the rows of a group.

Index  A   B   C    
0      A1  B5  T    
1      A1  B2  T    
2      A1  B2  F    
3      A2  B5  T    
4      A2  F5  T    
5      A3  F4  T    
6      A4  F4  F    

Returns:

Index  A   B   C   
3      A2  B5  T   
4      A2  F5  T   
5      A3  F4  T   

Tried: df.groupby('A').apply(lambda x: x[x['C']==T])

And as you may have known it returns:

Index  A   B   C   
0      A1  B5  T   
1      A1  B2  T   
3      A2  B5  T   
4      A2  F5  T   
5      A3  F4  T   

When I change apply to filter I get an error.

Help Please!

0

3 Answers 3

2

Using transform
Fastest solution that is also simple

df[df.C.eq('T').groupby(df.A.values).transform('all')]

        A   B  C
Index           
3      A2  B5  T
4      A2  F5  T
5      A3  F4  T

Using crosstab
Shortest solution I could think of... but slow

df[df.A.map(pd.crosstab(df.A, df.C).F.eq(0))]

        A   B  C
Index           
3      A2  B5  T
4      A2  F5  T
5      A3  F4  T

project/kill
Very fast solution... but complicated

f, u = pd.factorize(df.A.values)
t = (df.C.values == 'T').astype(int)
b0 = np.bincount(f * 2 + t)
pad = np.zeros(2 * u.size - b0.size, dtype=int)
b = np.append(b0, pad)

df[~b.reshape(-1, 2)[:, 0].astype(bool)[f]]

        A   B  C
Index           
3      A2  B5  T
4      A2  F5  T
5      A3  F4  T

Timing

%timeit df[df.C.eq('T').groupby(df.A.values).transform('all')]
%timeit df[df.A.map(pd.crosstab(df.A, df.C).F.eq(0))]
%timeit df.groupby('A').filter(lambda x: len(x[x.C=='T'])==len(x))

1000 loops, best of 3: 1.67 ms per loop
100 loops, best of 3: 6.15 ms per loop
100 loops, best of 3: 3.05 ms per loop

%%timeit
f, u = pd.factorize(df.A.values)
t = (df.C.values == 'T').astype(int)
b0 = np.bincount(f * 2 + t)
pad = np.zeros(2 * u.size - b0.size, dtype=int)
b = np.append(b0, pad)

df[~b.reshape(-1, 2)[:, 0].astype(bool)[f]]
1000 loops, best of 3: 279 µs per loop

d1 = df.assign(mydummy=df['C']=='T')
d1['mysum'] = d1.groupby('A').mydummy.transform('sum')
d1['mycount'] = d1.groupby('A').mysum.transform('size')
d1.loc[d1.mysum == d1.mycount, df.columns]
100 loops, best of 3: 3.68 ms per loop
Sign up to request clarification or add additional context in comments.

Comments

1

try this little fella

df['mydummy'] = x['C']== T
df['mysum'] = df.groupby('A').mydummy.transform('sum')
df['mycount'] = df.groupby('A').mysum.transform('size')
df = df.loc[mysum == mycount]

Comments

1

You can use filter after groupby to check if all rows in the groups have T in column C.

df.groupby('A').filter(lambda x: len(x[x.C=='T'])==len(x))
Out[41]: 
  Index   A   B  C
3     3  A2  B5  T
4     4  A2  F5  T
5     5  A3  F4  T

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.