I have data in the following format:
['FACTOR_1','FACTOR_2",'VALUE"]
['A' ,'A' ,2.0 ]
['A' ,'B' ,3.0 ]
['A' ,'C' ,2.2 ]
['A' ,'D' ,2.6 ]
['B' ,'A' ,2.6 ]
['B' ,'B' ,1.0 ]
['B' ,'C' ,6.0 ]
['B' ,'D' ,7.7 ]
['C' ,'A' ,2.1 ]
....
['D' ,'D' ,2.6 ]
It is in a data frame but I've been converting to a numpy array anyway.
I'd like to convert it into a matrix of the two factors.
I've coded it myself but the way I am currently doing it is very slow and inefficient, I have a nested loop and am searching for indices of the factors:
no_of_factors = np.size(np.unique(cov_data['FACTOR_1']))
factors = np.unique(cov_data['FACTOR_1'])
cov_matrix = np.zeros((no_of_factors, no_of_factors))
i = 0
for factor_1 in factors:
factor_indices = np.where(cov_data['FACTOR_1'] == factor_1)[0].tolist()
j = 0
for factor_2 in factors:
factor_2_index = np.where(cov_data['FACTOR_2'][factor_indices] == factor_2)[0].tolist()
if np.size(factor_2_index) > 1:
self.log.error("Found duplicate factor")
elif np.size(factor_2_index) == 0:
var = 0
else:
factor_2_index = factor_2_index[0]
var = cov_data['VALUE'][factor_2_index]
cov_matrix[i][j] = var
j += 1
i += 1
Annoyingly the data also isn't perfect and there aren't values for every factor, for example factor C might only have a value for A and B and D might be missing hence the check and setting to 0.
cov_dataobject isn't clear, though I might be able to create a usable copy.matrixis not a good description of your target, since innumpy,np.matrixis just a subclass ofndarraythat must be 2d. I think you are creating afactororfeaturematrix, something that's used in a package likescikit-learn. I'd suggest editing tags accordingly.