Python: Standard Deviation within a list of dataframes

Question

I have a list of say 50 dataframes 'list1', each dataframe has columns 'Speed' and 'Value', like this;

Speed   Value
1       12
2       17
3       19
4       21
5       25

I am trying to get the standard deviation of 'Value' for each speed, across all dataframes. The end goal is get a list or df of standard deviation for each speed, like this;

Speed   Standard Deviation
1       1.23
2       2.5
3       1.98
4       5.6
5       5.77

I've tried to pull the values into a new dataframe using a for loop, to then use 'statistics.stdev' on but I can't seem to get it working. Any help is really appreciatted!

Update!

pd.concat([d.set_index('Speed').values for d in df_power], axis=1).std(1)

This worked. Although, I forgot to mention that the values for Speed are not always the same between dataframes. Some dataframes miss a few and this ends up returning nan in those instances.

Please include the attempt which didn’t work. Is using pandas.concat() not possible? — AMC
– AMC, Commented Dec 3, 2019 at 17:18

Quang Hoang · Accepted Answer · 2019-12-03 17:18:07Z

3

You can concat and use std:

list_df = [df1, df2, df3, ...]
pd.concat([d.set_index('Speed') for d in list_dfs], axis=1).std(1)

answered Dec 3, 2019 at 17:18

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Omni Over a year ago

OP wrote, he wants the standard deviation for one specific speed across all dataframes

Quang Hoang Over a year ago

@Omni each not one specific.

piRSquared Over a year ago

@Omni this answer achieves what OP asked for.

piRSquared Over a year ago

@QuangHoang however, i think you get into trouble if 'Speed' is not unique within individual dataframes

Iceberg_Slim Over a year ago

I need standard deviation for each speed, however 'values' is not the only column in the dataframes so StDev = pd.concat([d.set_index('Speed').values for d in df_power], axis=1).std(1) is what I ended up using. Although, I should have stated that the values for Speed are not always the same. Some dataframes miss a few and this ends up returning nan in those instances. I'll update the question. Thanks @QuangHoang

Brandon · Accepted Answer · 2019-12-03 18:11:16Z

3

You'll want to concatenate, groupby speed, and take the standard deviation.

1) Concatenate your dataframes

list1 = [df_1, df_2, ...]
full_df = pd.concat(list1, axis=0) # stack all dataframes

2) Groupby speed and take the standard deviation

std_per_speed_df = full_df.groupby('speed')[['value']].std()

edited Dec 3, 2019 at 18:11

answered Dec 3, 2019 at 17:43

Brandon

1,0187 silver badges14 bronze badges

1 Comment

piRSquared Over a year ago

[df for df in list1] is the same thing as list1 so just do pd.concat(list1).groupby('speed').value.std()

rpanai · Accepted Answer · 2019-12-03 18:20:09Z

3

If the dataframes are all saved on the same folder you can use pd.concat +groupby as already suggested or you can use dask

import dask.dataframe as dd
import pandas as pd

df = dd.read_csv("data/*")
out = df.groupby("Speed")["Value"].std()\
        .compute()\
        .reset_index(name="Standard Deviation")

answered Dec 3, 2019 at 18:20

rpanai

13.5k3 gold badges48 silver badges65 bronze badges

Collectives™ on Stack Overflow

Python: Standard Deviation within a list of dataframes

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related