1

Im trying to filter the list1 based on another list2 with the following code:

import csv

with open('screen.csv') as f: #A file with a list of all the article titles
    reader = csv.reader(f)
    list1 = list(reader)

print(list1)

list2 = ["Knowledge Management", "modeling language"] #key words that article title should have (at least one of them)
list2 = [str(x) for x in list2]

occur = [i for i in list1  for j in list2 if str(j) in i]

print(occur)

but the output is empty.

My list1 looks like this: enter image description here

2
  • 1
    maybe those exact phrases don't exist? Commented Feb 15, 2022 at 21:42
  • here one of the title that I searched manually An Integrated Method for Knowledge Management in Product Configuration Projects which has Knowledge management in it to test it Commented Feb 15, 2022 at 21:44

3 Answers 3

1

list_1 is actually a list of lists, not a list of strings, so you need to flatten it (e.g. by doing this) before trying to compare elements:

list_1 = [['foo bar'], ['baz beep bop']]
list_2 = ['foo', 'bub']

flattened_list_1 = [
    element 
    for sublist in list_1 
    for element in sublist
]
occurrences = [
    phrase 
    for phrase in flattened_list_1 if any(
        word in phrase 
        for word in list_2
    )
]
print(occurrences)

# output:
# ['foo bar']
Sign up to request clarification or add additional context in comments.

3 Comments

I tried first code and worked perfectly, thanks
You are not benefiting from using sets here, since the only in operator being used is doing substring matches, not set membership tests. Don't be confused by the in keyword also being used in for x in y loops, that's not a membership test at all.
@Blckknght yes you're right - the "membership check" (substring match) is actually done in word in phrase... I'll edit out the second part as to not confuse others.
1
import pandas as pd 
import numpy as np
df = pd.DataFrame(data) 
print(df[df.column_of_list.map(lambda x: np.isin(x, another_list).all())])
#OR
print(df[df[0].map(lambda x: np.isin(x, another_list).all())])

Try with real data:

import numpy as np
import pandas as pd 
data = ["Knowledge Management", "modeling language"]
another_list=["modeling language","natural language"]
df = pd.DataFrame(data) 
a = df[df[0].map(lambda x: np.isin(x, another_list).all())]

print(a)

3 Comments

import pandas as pd import numpy as np df = pd.read_csv('screen.csv') list2 = ["Knowledge Management", "modeling language"] print(df[df[0].map(lambda x: np.isin(x, list2).all())])
But I get error raise KeyError(key) from err KeyError: 0
@Greencolor I edited answer with real data, check how is your csv?
0

Your list1 is a list of lists, because the csv.reader that you're using to create it always returns lists for each row, even if there's only a single item. (If you're expecting a single name from each row, I'm not sure why you're using csv here, it's only going to be a hindrance.)

Later when you check if str(j) in i as part of your filtering list comprehension, you're testing if the string j is present in the list i. Since the values in list2 are not full titles but key-phrases, you aren't going to find any matches. If you were checking in the inner strings, you'd get substring checks, but when you test list membership it must be an exact match.

Probably the best way to fix the problem is to do away with the nested lists in list1. Try creating it with:

with open('screen.csv') as f:
    list1 = [line.strip() for line in f]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.