I have a Python script, using pandas dataframes, that fills a dataframe by converting the elements of another dataframe. I could do it with a simple for loop or itertuples, but I wanted to see if it was possible to vectorize it for maximum speed (my dataframe is very large, ~60000x12000).
Here is an example of what I'm trying to do:
#Sample data
sample_list=[1,2,5]
I have a list of values like the one above. Each element in my new matrix is the sum of certain two elements from this list divided by a constant number n.
new_matrix[row,col]=(sample_list[row]+sample_list[col])/n
So the expected output for n=2 would be:
1 1.5 3
1.5 2 3.5
3 3.5 5
Right now I execute this with a for loop, iterating across each element of an empty matrix and setting them to the value calculated by the formula. Is there any way this operation could be vectorized (i.e. something like new_matrix=2*old_matrix rather than
for row, col in range(): new_matrix[row,col]=2*old_matrix[row,col]?
new_matrix=2*old_matrix? What happened? That is often the right way to do it. It would help a lot if you post a minimal same input data set and expected output results for us to work with.new_matrix=2*old_matrixisn't my formula, I meant it as an example of vectorization. The formula I'm trying to vectorize is the second block of code. I'll update my post with the expected output.