I have a list of lists as follws.
mylist = [[5274919, ["report", "porcelain", "firing", "technic"]], [5274920, ["implantology", "dentistry"]], [52749, ["method", "recognition", "long", "standing", "root", "perforation", "molar"]], [5274923, ["exogenic", "endogenic", "cause", "tooth", "jaw", "anomaly", "method", "method", "standing"]]]
I also have a list of concepts as follows.
myconcepts = ["method", "standing"]
I want to see how many times each concept in myconcepts is in mylist records. i.e.;
"method" = 2 times in records (i.e. in `52749` and `5274923`)
"standing" = 2 times in records
My current code is as follows.
mycounting = 0
for concept in myconcepts:
for item in mylist:
if concept in item[1]:
mycounting = mycounting + 1
print(mycounting)
However, my current mylist is very very large and have about 5 million records. myconcepts list have about 10000 concepts.
In my current code it takes nearly 1 minute for a concept to get the count, which is very slow.
I would like to know the most efficient way of doing this in python?
For testing purposes I have attached a small portion of my dataset in: https://drive.google.com/file/d/1z6FsBtLyDZClod9hK8nK4syivZToa7ps/view?usp=sharing
I am happy to provide more details if needed.
Counterobject (a type of dict), or use thecountmethod on the flattened list. These techniques are already documented well on Stack Overflow and elsewhere on line.mylist