What is the best way to do string matching on a column of lists?
E.g. I have a dataset:
import numpy as np
import pandas as pd
list_items = ['apple', 'grapple', 'tackle', 'satchel', 'snapple']
df = pd.DataFrame({'id':xrange(3), 'L':[np.random.choice(list_items, 3).tolist() for _ in xrange(3)]})
df
L id
0 [tackle, apple, grapple] 0
1 [tackle, snapple, satchel] 1
2 [satchel, satchel, tackle] 2
And I want to return the rows where any item in L matches a string, e.g. 'grap' should return row 0, and 'sat' should return rows 1:2.
df.L.apply(lambda row: any('whatever' in word for word in row))but this whole problem feels like one you shouldn't want to have.dictmapping ids to lists or whatnot. Unless you're getting some benefit from the DataFrame it is just adding some overhead.