What is the best efficient way to loop through 2d array in Python

Question

I am new to Python and machine learning. I can't find best way on the internet. I have a big 2d array (distance_matrix.shape= (47, 1328624)). I wrote below code but it takes too long time to run. For loop in for loop takes so time.

distance_matrix = [[0.21218192, 0.12845819, 0.54545613, 0.92464129, 0.12051526, 0.0870853 ], [0.2168166 , 0.11174682, 0.58193855, 0.93949729, 0.08060061, 0.11963891], [0.23996999, 0.17554854, 0.60833433, 0.93914766, 0.11631545, 0.2036373]]
                    
iskeleler = pd.DataFrame({
    'lat':[40.992752,41.083202,41.173462],
    'lon':[29.023165,29.066652,29.088163],
    'name':['Kadıköy','AnadoluHisarı','AnadoluKavağı']
}, dtype=str)

for i in range(len(distance_matrix)):
    for j in range(len(distance_matrix[0])):
        if distance_matrix[i][j] < 1:
            iskeleler.loc[i,'Address'] = distance_matrix[i][j]
        
print(iskeleler)

To explain, I am sharing the first 5 rows of my array and showing my dataframe. İskeleler dataframe distance_matrix

The "İskeleler" dataframe has 47 rows. I want to add them to the 'Address' column in row i in the "İskeleler" by looking at all the values in row i in the distance_matrix and adding the ones less than 1. I mean if we look at the first row in the distance_matrix photo, I want to add the numbers like 0.21218192 + 0.12845819 + 0.54545613 .... and put them in the 'address' column in the i'th row in the İskeleler dataframe.

My intend is to loop through distance_matrix and find some values which smaller than 1. The code takes too long. How can i do this with faster way?

Use numpy? You already import it. You also want to give us some code that actually runs. IMHO the use of uninitialized distance_matrix in line 2, 3, and 4 and iskeleler in line 5 and 7 gives an error — Thomas Weller
– Thomas Weller, Commented Apr 22, 2021 at 9:43
@ThomasWeller Actually I shared my code so you can understand it. Because I pulled both arrays from the internet. It would be a very long post if I shared the part I initialized with you. The question I'm asking is actually a theoretical question. It takes a lot of time to calculate by putting two for loops inside each other. I can't even see it working because an array of mine is too big (that's why I shared its shape). How can I do without two loops, actually that's my question. — gamzef
– gamzef, Commented Apr 22, 2021 at 10:00
You want to set iskeleler.loc equal to the last element less than 1 on each line of distance_matrix? — Mark Setchell
– Mark Setchell, Commented Apr 22, 2021 at 10:20

Mark Setchell · Accepted Answer · 2021-04-22 10:58:19Z

2

I think you mean this:

import numpy as np

# Set up some dummy data in range 0..100
distance = np.random.rand(47,1328624) * 100.0

# Boolean mask of all values < 1
mLessThan1 = distance<1

# Sum elements <1 across rows 
result = np.sum(distance*mLessThan1, axis=1)

That takes 168ms on my Mac.

In [47]: %timeit res = np.sum(distance*mLessThan1, axis=1)
168 ms ± 914 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

answered Apr 22, 2021 at 10:58

Mark Setchell

210k32 gold badges309 silver badges503 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

gamzef Over a year ago

Thanks a lot. It works fine. Sorry if it took me too long to tell you. I am not a native speaker of English and I'm just getting used to python.

Mark Setchell Over a year ago

No problems - good luck with your project! Avoid for loops with large Numpy arrays. Come back and ask another question if you get stuck - questions (and answers) are free 😀

Collectives™ on Stack Overflow

What is the best efficient way to loop through 2d array in Python

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related