How to use vectorization instead of for loop in pandas

Question

I'm trying to build a machine learning algorithm for my job. The data I'm using for training and testing has 17k rows and 20 columns. I've tried adding a new column based on two other columns but the for loop that I've written is too slow (3 seconds to be executed)

for i in range(0, len(model_olculeri)):
    if (model_olculeri["Bel"][i] != 0) and (model_olculeri["Basen"][i] != 0):
        sum_column = (model_olculeri["Bel"][i]) / (model_olculeri["Basen"][i])
        model_olculeri["Waist to Hip Ratio"][i] = sum_column

I read articles about pandas and numpy vectorization instead of for loop on pandas dataframes and it seems like it is so much faster and effective. How can I implement this kind of vectorization for my for loop? Thanks a lot.

Yes, looping over each row is generally slow, especially if you have an operation (in this case, division) that you want to apply on an entire column. — Joshua Voskamp
– Joshua Voskamp, Commented Oct 25, 2021 at 13:48

jezrael · Accepted Answer · 2021-10-25 13:49:42Z

1

Create boolean mask and use it for filtering:

m = (model_olculeri["Bel"] != 0) & (model_olculeri["Basen"] != 0)
model_olculeri.loc[m,"Waist to Hip Ratio"] = model_olculeri.loc[m, "Bel"] / model_olculeri.loc[m,"Basen"]

Alternative:

model_olculeri.loc[m,"Waist to Hip Ratio"] = model_olculeri["Bel"] / model_olculeri["Basen"]

Or set new value in numpy.where:

model_olculeri["Waist to Hip Ratio"] = np.where(m, model_olculeri["Bel"] / model_olculeri["Basen"], np.nan)

answered Oct 25, 2021 at 13:49

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user16386186 · Accepted Answer · 2021-10-25 20:03:00Z

0

Chained solution using query and pipe

model_olculeri.query("Bel != 0 & Basen != 0").pipe(lambda x:x.assign(Waist to Hip Ratio =  x.Bel/x.Basen)

answered Oct 25, 2021 at 20:03

user16386186

Collectives™ on Stack Overflow

How to use vectorization instead of for loop in pandas

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related