I have a data frame that I'd like to filter by a column that is of type array. What is the most effective way to do this?
df = pd.DataFrame({'a': [1,2,3,4,5], 'b': [['true','false'],['false'],['false','false','false'],['false','false','true'],[]]})
df
a b
0 1 [true, false]
1 2 [false]
2 3 [false, false, false]
3 4 [false, false, true]
4 5 []
I'd ideally like to only return rows that contain a true value.
arrayis not adtype. There no real effective way to work with lists inpandas.DataFrame's, but you could always do something likedf[df.b.apply(lambda x: 'true' in x)]any()be more performant?numpy.ndarrayobjects instead oflistobjects, maybe slightly, but the time sink is iterating over the rows, which is necessitated in this case. Furthermore, those arrays would bedtype=objectanyway, so iteration would still be slow