Nested for loops looping through a python list

Question

I have to loop through a list of over 4000 item and check their similarity with a recommendation algorithm in python.

The script takes a long time to run (10-11 Hours) and I wanted to incorporate multi-threading to improve speed but dont know how to do it exactly.

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt

    data=pd.read_csv('data.csv',index_col=0, encoding="ISO-8859-1")       

    # Get list of unique items
    itemList=list(set(data["product_ref"].tolist()))

    # Get count of customers
    userCount=len(set(data["customer_id"].tolist()))

    # Create an empty data frame to store item affinity scores for items.
    itemAffinity= pd.DataFrame(columns=('item1', 'item2', 'score'))

    def itemUsers(ind):
      return data[data.product_ref==itemList[ind]]["customer_id"].tolist()

    rowCount=0
    for ind1 in range(len(itemList)): 
        item1Users = itemUsers(ind1) 
        pool = Pool()
        pool.map(loop2, data_inputs)
        for ind2 in range(ind1+1, len(itemList)): 
            print(ind1, ":", ind2)       
            item2Users = itemUsers(ind2) 
            commonUsers= len(set(item1Users).intersection(set(item2Users))) 
            score=commonUsers / userCount
            itemAffinity.loc[rowCount] = [itemList[ind1],itemList[ind2],score] 
            rowCount +=1

What exactly is itemUsers doing? Can you give a small example input and expected output for this? — roganjosh
– roganjosh, Commented Jun 19, 2018 at 14:29
Hold on, this is in a DataFrame? Have you not considered doing this using Pandas tools to try and vectorize the calculation? I'm not clear on any restriction that would force you to drop out of pandas; you should be able to do this is a couple of seconds (at a very conservative estimate) — roganjosh
– roganjosh, Commented Jun 19, 2018 at 14:32
i updated my post, it only returns the users which bought the item number ind — M1rwen
– M1rwen, Commented Jun 19, 2018 at 14:32
@roganjosh yes im using DataFrames. How do i vectorize the calculation using pandas ? — M1rwen
– M1rwen, Commented Jun 19, 2018 at 14:34
Please consider rewriting this as an MCVE for a Pandas solution. At the moment we don't have anything to test on. An efficient python solution in the meantime will be of interest, but you should be using pandas where possible. — roganjosh
– roganjosh, Commented Jun 19, 2018 at 14:35

Rohi · Accepted Answer · 2018-06-19 14:28:09Z

1

Incoprating multi-threading will not improve your running time.

Think about it this way, when you multi-thread you spread your computational time between multiple threads - When you could of spread it on one process.

It could help when you have on thread waiting for user input for example and you want to compute while waiting, but this isn't your case.

answered Jun 19, 2018 at 14:28

Rohi

8041 gold badge10 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

leoschet Over a year ago

he's looking for multiprocessing

M1rwen Over a year ago

@Rohi Yes exactly im looking for multiprocessing

Collectives™ on Stack Overflow

Nested for loops looping through a python list

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related