Avoid computing the sub-DataFrame df[df['D'] == i] more than once. The original code computes this len(myunique)**2 times. Instead you can compute this once for each i (that is, len(myunique) times in total), store the results, and then pair them together later. For example,
groups = [grp for di, grp in df.groupby('D')]
for itemp, jtemp in IT.product(groups, repeat=2):
pass
import pandas as pd
import itertools as IT
df = pd.DataFrame(np.random.randint(0,1000,size=(1000, 4)), columns=list('ABCD'))
def using_orig():
myunique = df['D'].unique()
for i in myunique:
itemp = df[df['D'] == i]
for j in myunique:
jtemp = df[df['D'] == j]
def using_groupby():
groups = [grp for di, grp in df.groupby('D')]
for itemp, jtemp in IT.product(groups, repeat=2):
pass
In [28]: %timeit using_groupby()
10 loops, best of 3: 63.8 ms per loop
In [31]: %timeit using_orig()
1 loop, best of 3: 2min 22s per loop
Regarding the comment:
I can easily replace itemp and jtemp with a=1 or print "Hello" so ignore that
The answer above addresses how to compute itemp and jtemp more efficiently. If itemp and jtemp are not central to your real calculation, then we would need to better understand what you really want to compute in order to suggest (if possible) a way to compute it faster.
your code isn't working for meorit doesn't work with my data, etc.itempandjtemp? Again as MaxU said, a sample representative input data and the expected output with the explanation as to how it was achieved, would help a lot.dftwo times - initempand injtemp. So it's hardly possible to help you without clear understanding what are you after