How to vectorize python for loop that modifies each element of a dataframe?

Question

I have a Python script, using pandas dataframes, that fills a dataframe by converting the elements of another dataframe. I could do it with a simple for loop or itertuples, but I wanted to see if it was possible to vectorize it for maximum speed (my dataframe is very large, ~60000x12000).

Here is an example of what I'm trying to do:

    #Sample data
    sample_list=[1,2,5]

I have a list of values like the one above. Each element in my new matrix is the sum of certain two elements from this list divided by a constant number n.

new_matrix[row,col]=(sample_list[row]+sample_list[col])/n

So the expected output for n=2 would be:

1   1.5 3
1.5 2   3.5
3   3.5 5

Right now I execute this with a for loop, iterating across each element of an empty matrix and setting them to the value calculated by the formula. Is there any way this operation could be vectorized (i.e. something like new_matrix=2*old_matrix rather than for row, col in range(): new_matrix[row,col]=2*old_matrix[row,col]?

Did you try new_matrix=2*old_matrix? What happened? That is often the right way to do it. It would help a lot if you post a minimal same input data set and expected output results for us to work with. — John Zwinck
– John Zwinck, Commented Jun 23, 2019 at 8:06
new_matrix=2*old_matrix isn't my formula, I meant it as an example of vectorization. The formula I'm trying to vectorize is the second block of code. I'll update my post with the expected output. — AldehydeDeva
– AldehydeDeva, Commented Jun 23, 2019 at 8:12

John Zwinck · Accepted Answer · 2019-06-23 08:57:46Z

2

First convert your list to an array:

arr = np.asarray(sample_list)

Then note that your addition needs to broadcast to produce a 2D output. To add a "virtual" dimension to an array, use np.newaxis:

arr[:,np.newaxis] + arr

That gives you:

array([[ 2,  3,  6],
       [ 3,  4,  7],
       [ 6,  7, 10]])

Which is trivially divided by 2 to get the final result.

Doing the other way around is more efficient, as the divisions are in 1D instead of 2D:

arr = np.asarray(sample_list) / 2
arr[:,np.newaxis] + arr

answered Jun 23, 2019 at 8:57

John Zwinck

252k44 gold badges346 silver badges459 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to vectorize python for loop that modifies each element of a dataframe?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related