I am dealing with a preprocessing stage of a data table. My current code works but I am wondering if there is a more efficient way.
My data table looks like this
object A object B features of A features of B
aaa w 1 0
aaa q 1 1
bbb x 0 0
ccc w 1 0
for the X it would be
[ (aaa, aaa, bbb, ccc), (w, q, x, w), (1, 1, 0, 1), (0, 1, 0, 0)]
Now I am writing a code to make a table that includes all the combination of every possible match of object A & object B (iterate the combination of object A & object B without repetition), while A & B keeps their features respectively. The table would look like the follows:(rows with a star are the added rows)
object A object B features of A features of B
aaa w 1 0
aaa q 1 1
* aaa x 1 0
---------------------------------------------------------
bbb x 0 0
* bbb w 0 0
* bbb q 0 1
---------------------------------------------------------
ccc w 1 0
* ccc x 1 0
* ccc q 1 1
The whole data is named X To get the table: My code is as follows, but it runs very slow:
-----------------------------------------
#This part is still fast
#to make the combination of object A and object B with no repetition
def uprod(*seqs):
def inner(i):
if i == n:
yield tuple(result)
return
for elt in sets[i] - seen:
seen.add(elt)
result[i] = elt
for t in inner(i+1):
yield t
seen.remove(elt)
sets = [set(seq) for seq in seqs]
n = len(sets)
seen = set()
result = [None] * n
for t in inner(0):
yield t
#add all possibility into a new list named "new_data"
new_data = list(uprod(X[0],X[1]))
X_8v = X[:]
y_8v = y[:]
-----------------------------------------
#if the current X_8v( content equals to X) does not have the match of object A and object B
#in the list "new_data"
#append a new row to the current X_8v
#Now this part is super slow, I think because I iterate a lot
for i, j in list(enumerate(X_8v[0])):
for k, w in list(enumerate(X_8v[1])):
if (X_8v[0][i], X_8v[1][k]) not in new_data:
X_8v[0] + (X_8v[0][i],)
X_8v[1] + (X_8v[1][k],)
X_8v[2] + (X_8v[2][i],)
X_8v[3] + (X_8v[3][k],)
X_8v[4] + (X_8v[4][i],)
X_8v[5] + (0,)
X_8v[6] + (0,)
y_8v.append(0)
is there any possible improvement for the code above?
Many thanks!
X = ...).aaaand the like are variables?).