1

I have this loop in order to calculate a value for same datetimes in a dataframe

   for epoch in data_all['EPOCH'].unique():
    data_epoch = data_all.query('EPOCH==@epoch')
    data_epoch['SIGMA'] = pd.to_numeric(data_epoch['SIGMA'].values)
    variance = np.mean(data_epoch['SIGMA'].values ** 2)

But that is very slow. Could you one way to do that faster?

Thank you

1 Answer 1

1

This is just groupby:

variances = data_all.groupby('EPOCH')['SIGMA'].var()

Or if you want to use your formular:

variances = (data_all['SIGMA']**2).groupby(data_all['EPOCH']).mean()

Update For your add-on question:

variances = data_all.groupby('EPOCH')['SIGMA'].transform('var')
data_all['GROUP'] = (variances<1).astype(int)
Sign up to request clarification or add additional context in comments.

3 Comments

And if Do I want create other column depending on the variance value? For example: if variances < 1, in this epochs the value in new column 'GROUP' will be 1, if not 0. How I evaluate that in each epoch without a loop? Thank you
Sorry for doing other question. I should make better my ask. I would say can separate the GROUP value in three possibles values, not only one. For example in [1, 2, 3] depending if If var > 0.08 group 1, if var > 0.008 group 2, if var > -0.1 group 3. Thanks very much!
@jatorna use np.select or pd.cut on the variances generated by transform in the update.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.