3

I have a list of say 50 dataframes 'list1', each dataframe has columns 'Speed' and 'Value', like this;

Speed   Value
1       12
2       17
3       19
4       21
5       25

I am trying to get the standard deviation of 'Value' for each speed, across all dataframes. The end goal is get a list or df of standard deviation for each speed, like this;

Speed   Standard Deviation
1       1.23
2       2.5
3       1.98
4       5.6
5       5.77

I've tried to pull the values into a new dataframe using a for loop, to then use 'statistics.stdev' on but I can't seem to get it working. Any help is really appreciatted!

Update!

pd.concat([d.set_index('Speed').values for d in df_power], axis=1).std(1)

This worked. Although, I forgot to mention that the values for Speed are not always the same between dataframes. Some dataframes miss a few and this ends up returning nan in those instances.

1
  • Please include the attempt which didn’t work. Is using pandas.concat() not possible? Commented Dec 3, 2019 at 17:18

3 Answers 3

3

You can concat and use std:

list_df = [df1, df2, df3, ...]
pd.concat([d.set_index('Speed') for d in list_dfs], axis=1).std(1)
Sign up to request clarification or add additional context in comments.

5 Comments

OP wrote, he wants the standard deviation for one specific speed across all dataframes
@Omni each not one specific.
@Omni this answer achieves what OP asked for.
@QuangHoang however, i think you get into trouble if 'Speed' is not unique within individual dataframes
I need standard deviation for each speed, however 'values' is not the only column in the dataframes so StDev = pd.concat([d.set_index('Speed').values for d in df_power], axis=1).std(1) is what I ended up using. Although, I should have stated that the values for Speed are not always the same. Some dataframes miss a few and this ends up returning nan in those instances. I'll update the question. Thanks @QuangHoang
3

You'll want to concatenate, groupby speed, and take the standard deviation.

1) Concatenate your dataframes

list1 = [df_1, df_2, ...]
full_df = pd.concat(list1, axis=0) # stack all dataframes

2) Groupby speed and take the standard deviation

std_per_speed_df = full_df.groupby('speed')[['value']].std()

1 Comment

[df for df in list1] is the same thing as list1 so just do pd.concat(list1).groupby('speed').value.std()
3

If the dataframes are all saved on the same folder you can use pd.concat +groupby as already suggested or you can use dask

import dask.dataframe as dd
import pandas as pd

df = dd.read_csv("data/*")
out = df.groupby("Speed")["Value"].std()\
        .compute()\
        .reset_index(name="Standard Deviation")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.