-3

I have thousands of products with ingredients of each for example:

ProductID  | Ingredients
 00001     |  itemA, itemB, itemC, itemD
 00002     |  itemF, itemD, itemG, itemA, itemI
 00003     |  itemH, itemI, itemD, itemF, itemT,itemB, itemC

........ and so on.

I want to make a unique list of ingredients and make a map that what ingredients are in which product. So For example I want the resulting output in the following way:

{itemA: [00001,00011, 00005,00007]}
{itemB: [00003, 00002, 000056]}
{itemC: [00009, 00087, 00044, 00647, 00031, 00025]}

So the list size will be different for each item. Can somebody help me out in solving this problem? Thanks

4
  • 2
    You probably want one dictionary, not three. Also, can you specify the input a little more? Is that a txt file? Commented Oct 14, 2016 at 18:34
  • What shape is the source data? Commented Oct 14, 2016 at 18:34
  • Yes exactly one dictionary such that each item is a key and number of product IDs are the values. Commented Oct 14, 2016 at 18:35
  • 1
    The linked duplicate, especially the use of defaultdict in the accepted answer really is the way you should go. Commented Oct 14, 2016 at 18:39

1 Answer 1

1

Assuming its a text file, it could be something like this:

from collections import defaultdict

product_ingredients_mapping = defaultdict(list)
file_data = open('products.txt')

for row in file_data.readlines():
    data = row.split('|')
    ingredients = data[1].split(',')
    product_id = data[0].strip()
    for ingredient in ingredients:
       product_ingredients_mapping[ingredient.strip()].append(product_id)
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.