Optimize pandas loop

Question

I have this loop in order to calculate a value for same datetimes in a dataframe

   for epoch in data_all['EPOCH'].unique():
    data_epoch = data_all.query('EPOCH==@epoch')
    data_epoch['SIGMA'] = pd.to_numeric(data_epoch['SIGMA'].values)
    variance = np.mean(data_epoch['SIGMA'].values ** 2)

But that is very slow. Could you one way to do that faster?

Thank you

Quang Hoang · Accepted Answer · 2020-06-10 13:11:28Z

1

This is just groupby:

variances = data_all.groupby('EPOCH')['SIGMA'].var()

Or if you want to use your formular:

variances = (data_all['SIGMA']**2).groupby(data_all['EPOCH']).mean()

Update For your add-on question:

variances = data_all.groupby('EPOCH')['SIGMA'].transform('var')
data_all['GROUP'] = (variances<1).astype(int)

edited Jun 10, 2020 at 13:11

answered Jun 10, 2020 at 12:05

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

jatorna Over a year ago

And if Do I want create other column depending on the variance value? For example: if variances < 1, in this epochs the value in new column 'GROUP' will be 1, if not 0. How I evaluate that in each epoch without a loop? Thank you

jatorna Over a year ago

Sorry for doing other question. I should make better my ask. I would say can separate the GROUP value in three possibles values, not only one. For example in [1, 2, 3] depending if If var > 0.08 group 1, if var > 0.008 group 2, if var > -0.1 group 3. Thanks very much!

Quang Hoang Over a year ago

@jatorna use np.select or pd.cut on the variances generated by transform in the update.

Collectives™ on Stack Overflow

Optimize pandas loop

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related