Filtering pandas dataframe using element of list

Question

I am trying to filter a dataframe where some columns are lists. And I want to base the filter out elements that does not pass the condition.

For example:

import pandas as pd 
df = pd.DataFrame({'col1':[10,20], 'col2': [[1,2,3],[3,4,5]], 'col3': [[False,False,True],[True,True,False]],'col4':[True,False]})

   col1       col2                  col3   col4
0    10  [1, 2, 3]  [False, False, True]   True
1    20  [3, 4, 5]   [True, True, False]  False

applying the filter

df_filtered = df.query("col2>2 & col3==True")

the output I expect

Thanks for the help!

Maybe you want to transform your data as in this question then query. — Quang Hoang
– Quang Hoang, Commented Feb 8, 2021 at 15:27
It looks like he is trying to use the boolean lists in col3 as a filter against the lists in col2. Col4 seems irrelevant — dawg
– dawg, Commented Feb 8, 2021 at 15:29
@GiorgosMyrianthous because it does not satisfy condition on col3 — Stefio Yosse Andrean
– Stefio Yosse Andrean, Commented Feb 8, 2021 at 15:29
@QuangHoang if you mean using explode(), I have tried it but it is very slow and ended up blowing up the size of the dataframe. I am working on very large dataset unfortunately. — Stefio Yosse Andrean
– Stefio Yosse Andrean, Commented Feb 8, 2021 at 15:31

Quang Hoang · Accepted Answer · 2021-02-08 15:41:49Z

4

Try:

df[['col2','col3']] = (pd.DataFrame({'col2': df['col2'].explode(),
                                     'col3': df['col3'].explode()})
                         .query('col2>2 & col3==True')
                         .groupby(level=0).agg(list)
                      )

Output:

print(df)

   col1    col2          col3   col4
0    10     [3]        [True]   True
1    20  [3, 4]  [True, True]  False

answered Feb 8, 2021 at 15:41

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Stefio Yosse Andrean Over a year ago

thank you Quang! as you said, it is not as bad performance-wise as I thought!

dawg Over a year ago

The version from Ben. T. is 10x faster than this...

dawg · Accepted Answer · 2021-02-08 17:31:32Z

2

You can use numpy and an iterative approach if memory is the main constraint.

This modifies the dataframe in place without having to create a large interim data structure in the process:

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1':[10,20], 'col2': [[1,2,3],[3,4,5]], 'col3': [[False,False,True],[True,True,False]]})

for idx, row in df.iterrows():
    a1=np.array(row['col2'])
    a2=np.array(row['col3'])
    df.at[idx,'col2']=a1[(a1>2) & a2]
    df.at[idx,'col3']=a2[a2]

>>> df
   col1    col2          col3
0    10     [3]        [True]
1    20  [3, 4]  [True, True]

edited Feb 8, 2021 at 17:31

answered Feb 8, 2021 at 16:27

dawg

105k24 gold badges142 silver badges217 bronze badges

Comments

Ben.T · Accepted Answer · 2021-02-08 15:50:33Z

1

As lists are same size across the rows, you can probably use arrays and mask like this

arr2 = np.array(df['col2'].tolist())
arr3 = np.array(df['col3'].tolist())

df[['col2','col3']] = [[c2[b],c3[b]] for c2,c3,b in zip(arr2,arr3,(arr2>=2) & arr3)]

print(df)
   col1    col2          col3   col4
0    10     [3]        [True]   True
1    20  [3, 4]  [True, True]  False

answered Feb 8, 2021 at 15:50

Ben.T

29.7k6 gold badges39 silver badges57 bronze badges

1 Comment

dawg Over a year ago

This is 10x faster than the version with .explode. I timed it...

LoukasPap · Accepted Answer · 2021-02-08 15:45:17Z

0

Another way with loops, but probably slower:

for index, row in df.iterrows():
    j=0
    for i in df.at[index, 'col3']:
        if i==False:
            df.at[index, 'col2'].remove(df.at[index, 'col2'][j])
        else:
            j=j+1
    df.at[index, 'col3']=list(filter(None, df.at[index, 'col3']))

answered Feb 8, 2021 at 15:45

LoukasPap

1,3481 gold badge11 silver badges22 bronze badges

Collectives™ on Stack Overflow

Filtering pandas dataframe using element of list

4 Answers 4

2 Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related