4

I would like to automatically create a tuple (to be passed to a scipy.stats function) from columns in a pandas dataframe, so that each row of the tuple are the values from each column of the dataframe. here is the header from my dataframe:

                     4_3-a-0    5_3-a-4    7_3-a-3
datetime_pac                                      
2015-09-03 22:00:00   -100.4 -96.857143 -55.000000
2015-09-03 22:01:00   -100.5 -91.700000 -55.600000
2015-09-03 22:02:00   -100.4 -90.875000 -55.900000
2015-09-03 22:03:00   -100.4 -94.000000 -55.555556
2015-09-03 22:04:00   -100.5 -99.500000 -55.545455

I can achieve this manually like so:

from scipy import stats
stats.f_oneway(df.ix[:,0], df.ix[:,1], df.ix[:,2])

But I would like to 'automate' it in cases where the number of columns in the dataframe is unknown. The following attempts (and many variations of) would not work:

stats.f_oneway(tuple(x) for x in xtmp.values)
stats.f_oneway((xtmp[x]) for x in xtmp.columns)

Thanks for your help!

1
  • I found the answer in another post: stats.f_oneway(*df.values) Commented Sep 11, 2015 at 10:19

2 Answers 2

9

Just call apply and call tuple:

In [3]:
df = pd.DataFrame(np.random.randn(5,3))
df

Out[3]:
          0         1         2
0  0.785562 -0.263813  2.239865
1  1.083918  0.035746  0.429111
2  1.422599 -0.818151  0.765725
3  1.022289  0.098561 -2.393095
4 -0.548451 -0.345796  0.298237

In [4]:
df.apply(tuple, axis=1)

Out[4]:
0     (0.785562108573, -0.263813112223, 2.23986497964)
1     (1.08391788685, 0.0357457180803, 0.429110675053)
2      (1.4225989372, -0.818150896781, 0.765724984713)
3     (1.02228880387, 0.0985610274998, -2.39309469576)
4    (-0.548450748411, -0.345796089243, 0.298237353...
dtype: object
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the new approach. While this certainly puts columns into a tuple (with axis=0), the stats.f_oneway() throws the error: 'ValueError: setting an array element with a sequence.' This may be because the output is actually a series and not a tuple.
Sorry but I can't coding problems based on the errors, you need to show your code, also doesn't new_df = df.apply(tuple) for col in new_df: print(stats.f_oneway(col)) do what you want?
IN the end, I needed to unpack the variables and pass them to the function. Thanks again for your input!
0

What about

tuple([tuple(df[col]) for col in df])

4 Comments

Thanks for this suggestion. It does create a tuple in the correct format but when fed to stats.f_oneway() it returns f and p values of the same length as each row of the tuple. This is the case if I were to pass a tuple (tup=df.ix[:,0], df.ix[:,1], df.ix[:,2]) to the function as stats.f_oneway(tup) instead of stats.f_oneway(df.ix[:,0], df.ix[:,1], df.ix[:,2]) - the latter provides the correct f and p values.
That's because you need to unpack the tuple into positional arguments: stats.f_oneway(*tuple([tuple(df[col]) for col in df])) or stats.f_oneway(*df.apply(tuple, axis=0))
Or even shorter stats.f_oneway(*df.T.values)
Thanks very much for that. As it turns out I did the last of your suggestions where I feed in the dataframe by unpacking with '*', which was something I did not understand previously. No need to convert my dataframe to a tuple!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.