I wonder if there is another way to speed up my multiple nested for loops in a matrix function.
Here is my function:
def matrix(Xbin, y):
labels = np.unique(y)
con_matrix = []
start = time.time()
for i in range(len(labels)):
for j in range(i + 1, len(labels)):
# Crossover
for u in Xbin[y == labels[i]]:
for v in Xbin[y == labels[j]]:
con_matrix.append(np.bitwise_xor(u, v))
end = time.time()
duration = end - start
print("total time for nested loop: ", duration)
constraint_matrix = np.array(con_matrix)
bin_attr_dim = [i for i in range(1, Xbin.shape[1] + 1)]
df = pd.DataFrame(constraint_matrix, columns=bin_attr_dim)
return df
Please note that Xbin is a numpy.ndarray. y denotes the unique group (1, 2, 3). Picture below denotes that kind of ndarray (start with column a to h) Figure 1:
My function matrix described above will generate the output as a DataFrame corresponding to the picture below (columns a to h). It is the combination between elements in different groups. Figure 2:
Here is my code to generate a dataset in binarize format as shown in Figure 1:
def binarize_dataset(X, y):
cutpoints = {}
att = -1
for row in X.T:
att += 1
labels = None # Previous labels
u = -9999 # Previous xi
# Finding transitions
for v in sorted(np.unique(row)):
variation = v - u # Current - Previous
# Classes where current v appears
indexes = np.where(row == v)[0]
# current label
__labels = set(y[indexes])
# Main condition
if labels is not None and variation > 0:
# Testing for transition to find the essential cut-points
if (len(labels) > 1 or len(__labels) > 1) or labels != __labels:
# cut-point id
cid = len(cutpoints)
cutpoints[cid] = (att, u + variation / 2.0)
labels = __labels
# previous equals current
u = v
new_dict = {}
# Iterate over the values in the original dictionary
for key, value in cutpoints.items():
first_element = value[0]
second_element = value[1]
# Check if the first_element is already a key in new_dict
if first_element in new_dict:
new_dict[first_element].append(second_element)
else:
new_dict[first_element] = [second_element]
# Generate combinations of the second elements within each group
for key, value in new_dict.items():
comb = combinations(value, 2)
# Append the combinations to the value list
for c in comb:
new_dict[key].append(c)
arrays = []
for attr, cutpoints in new_dict.items():
for cutpoint in cutpoints:
row = X.T[attr]
if isinstance(cutpoint, tuple):
lowerbound = cutpoint[0] <= row.reshape(X.shape[0], 1)
upperbound = row.reshape(X.shape[0], 1) < cutpoint[1]
row = np.logical_and(lowerbound, upperbound)
arrays.append(row)
else:
row = row.reshape(X.shape[0], 1) >= cutpoint
arrays.append(row)
Xbin = np.concatenate(arrays, axis=1)
bin_attr_dim = [i for i in range(1, Xbin.shape[1] + 1)]
df = pd.DataFrame(Xbin, columns=bin_attr_dim)
start = 0
dict_parent_children = {}
for key, list_value in new_dict.items():
dict_parent_children[key] = list(df.columns[start: start + len(list_value)])
start += len(list_value)
return Xbin, df, dict_parent_children
When I tested with iris dataset which is a small dataset, it works really fast.
X, y = datasets.load_iris(return_X_y=True)
bin_dataset, data, dict_parent_children = binarize_dataset(X, y)
con_matrix = matrix(bin_dataset, y)
When I tested with a bigger dataset like breast cancer, it started getting longer and longer.
X, y = datasets.load_breast_cancer(return_X_y=True)
bin_dataset, data, dict_parent_children = binarize_dataset(X, y)
con_matrix = matrix(bin_dataset, y)
Imagine testing with a dataset bigger than breast cancer, the question is how can I speed up my function matrix as fast as possible in this case or is there a faster way to rewrite a faster matrix function?


forloops at all costs if you're looking for speed.numpythe obvious step to speeding up loops is to replace them with whole-array methods ("vectorized"). They still loop, but in compiled code. Sometimes that's easy to do (with experience). Sometimes it takes some tricks. Or focus on one loop at a time, or looking for some top-down pattern. But as @jared wrote, if the problem/code is too big, most of us will jut move on to a more interesting and focused question.