Filter list based on another list in python

Question

Im trying to filter the list1 based on another list2 with the following code:

import csv

with open('screen.csv') as f: #A file with a list of all the article titles
    reader = csv.reader(f)
    list1 = list(reader)

print(list1)

list2 = ["Knowledge Management", "modeling language"] #key words that article title should have (at least one of them)
list2 = [str(x) for x in list2]

occur = [i for i in list1  for j in list2 if str(j) in i]

print(occur)

but the output is empty.

My list1 looks like this:

here one of the title that I searched manually An Integrated Method for Knowledge Management in Product Configuration Projects which has Knowledge management in it to test it — Greencolor
– Greencolor, Commented Feb 15, 2022 at 21:44

jfaccioni · Accepted Answer · 2022-02-15 22:11:24Z

1

list_1 is actually a list of lists, not a list of strings, so you need to flatten it (e.g. by doing this) before trying to compare elements:

list_1 = [['foo bar'], ['baz beep bop']]
list_2 = ['foo', 'bub']

flattened_list_1 = [
    element 
    for sublist in list_1 
    for element in sublist
]
occurrences = [
    phrase 
    for phrase in flattened_list_1 if any(
        word in phrase 
        for word in list_2
    )
]
print(occurrences)

# output:
# ['foo bar']

edited Feb 15, 2022 at 22:11

answered Feb 15, 2022 at 21:53

jfaccioni

7,5591 gold badge11 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Greencolor Over a year ago

I tried first code and worked perfectly, thanks

Blckknght Over a year ago

You are not benefiting from using sets here, since the only in operator being used is doing substring matches, not set membership tests. Don't be confused by the in keyword also being used in for x in y loops, that's not a membership test at all.

jfaccioni Over a year ago

@Blckknght yes you're right - the "membership check" (substring match) is actually done in word in phrase... I'll edit out the second part as to not confuse others.

Mahsa Hassankashi · Accepted Answer · 2022-02-15 21:55:24Z

1

import pandas as pd 
import numpy as np
df = pd.DataFrame(data) 
print(df[df.column_of_list.map(lambda x: np.isin(x, another_list).all())])
#OR
print(df[df[0].map(lambda x: np.isin(x, another_list).all())])

Try with real data:

import numpy as np
import pandas as pd 
data = ["Knowledge Management", "modeling language"]
another_list=["modeling language","natural language"]
df = pd.DataFrame(data) 
a = df[df[0].map(lambda x: np.isin(x, another_list).all())]

print(a)

edited Feb 15, 2022 at 21:55

answered Feb 15, 2022 at 21:44

Mahsa Hassankashi

2,1451 gold badge16 silver badges26 bronze badges

3 Comments

Greencolor Over a year ago

import pandas as pd import numpy as np df = pd.read_csv('screen.csv') list2 = ["Knowledge Management", "modeling language"] print(df[df[0].map(lambda x: np.isin(x, list2).all())])

Greencolor Over a year ago

But I get error raise KeyError(key) from err KeyError: 0

Mahsa Hassankashi Over a year ago

@Greencolor I edited answer with real data, check how is your csv?

Blckknght · Accepted Answer · 2022-02-15 21:58:35Z

Your list1 is a list of lists, because the csv.reader that you're using to create it always returns lists for each row, even if there's only a single item. (If you're expecting a single name from each row, I'm not sure why you're using csv here, it's only going to be a hindrance.)

Later when you check if str(j) in i as part of your filtering list comprehension, you're testing if the string j is present in the list i. Since the values in list2 are not full titles but key-phrases, you aren't going to find any matches. If you were checking in the inner strings, you'd get substring checks, but when you test list membership it must be an exact match.

Probably the best way to fix the problem is to do away with the nested lists in list1. Try creating it with:

with open('screen.csv') as f:
    list1 = [line.strip() for line in f]

Collectives™ on Stack Overflow

Filter list based on another list in python

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related