Python / Pandas Tables

Question

First: thank you for the great help so far! I have a question on working with table formatting in iPython.

I currently run this script to print the Augmented Dickey-Fuller (ADF) Test for Stationarity:

print "Stationarity"
print sm.tsa.stattools.adfuller(df['temperature'], maxlag=None, autolag='BIC', regression='c')

The output is something like this:

Stationarity
(-6.4532219513246361, 1.5054094590984612e-08, 0, 41, {'5%': -2.9351348158036012, '1%': -3.6009833671885199, '10%': -2.6059629803688282}, 1227.2605520289471*)

*(not sure what this value is Link to Documentation)

Now, my questions are:

How can I automate the calculation for more than one variable? Is it possible to create a list containing the different columns (df['variable1'], df['variable1'], df['variable1'], df['variable1'], ...) that applies the ADF test for each item?
How can I put the returning data into a table structure? Something like this:

ADF Test

Variable      nobs  t-test  p-value                     1%       5%      10%
temperature     41  6.4532  1.5054094590984612e-08  -3.600  -2.9351  -2.6059
variable 2     ...
variable 3     ...

(By the way: How to convert "1.5054094590984612e-08" into an accurate number?)

Thanks for your support!

Can you post raw input data, as for your first question I think this should work, if col_list is a list of your variables then df[col_list].apply(sm.tsa.stattools.adfuller, maxlag=None, autolag='BIC', regression='c') you can assign these to new columns and then transpose if you want the columns as index values. — EdChum
– EdChum, Commented Apr 30, 2015 at 8:39
Cool - Questions #1 seems to be solved now! I implemented the script and it works. Can you help me on the second problem? — Christopher
– Christopher, Commented Apr 30, 2015 at 12:58
Can you some raw data or a link to your data, I have no clue what inputs this is expecting — EdChum
– EdChum, Commented Apr 30, 2015 at 13:04
As input, you basically only need a list of values (time series). I can't disclose the data here, but imagine it's a list of 1000+ values between 1 and 30. — Christopher
– Christopher, Commented Apr 30, 2015 at 15:23

EdChum · Accepted Answer · 2015-04-30 17:23:27Z

2

So I basically knocked up some dummy data, basically I build a dict for each col to store the adf test results and then construct a df for each result:

In [12]:

df = pd.DataFrame(index = pd.date_range(start=dt.datetime(2014,1,1), end = dt.datetime(2014,6,1)))
import statsmodels.tsa.stattools as ts
df['a'] = np.random.randint(0,30,len(df.index))
df['b'] = np.random.randint(0,30,len(df.index))

result={}
for col in df:
    result[col] = ts.adfuller(df[col], maxlag=None, autolag='BIC', regression='c')
result
Out[12]:
{'a': (-14.5378299332063,
  5.2041541962613174e-27,
  0,
  151,
  {'1%': -3.4744158894942156,
   '10%': -2.5770812758212358,
   '5%': -2.8808783827710589},
  983.29106640612281),
 'b': (-12.247140023284922,
  9.7254933298555022e-23,
  0,
  151,
  {'1%': -3.4744158894942156,
   '10%': -2.5770812758212358,
   '5%': -2.8808783827710589},
  983.89321857804237)}
In [29]:

df_result = pd.DataFrame()

for k,v in result.items():
    df_result = df_result.append(pd.DataFrame(
            data={'nobs':v[3], 't-test':v[0], 'p-value':v[1], '1%':v[4]['1%'], '5%':v[4]['5%'], '10%':v[4]['10%']},
            index=[k]))
df_result.index.name = 'temperature'
df_result
Out[29]:
                   1%       10%        5%  nobs       p-value    t-test
temperature                                                            
a           -3.474416 -2.577081 -2.880878   151  5.204154e-27 -14.53783
b           -3.474416 -2.577081 -2.880878   151  9.725493e-23 -12.24714

answered Apr 30, 2015 at 17:23

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Christopher Over a year ago

It seems to work for except that it rounds up all items. "-3.474416" is "-3", "-2.577081" is "-2"...

EdChum Over a year ago

That could be a display issue, what happens when you print result like in my code above?

Christopher Over a year ago

Those values are correct (16 digits after the decimal). I have no idea why the values in the df_results table are rounded

EdChum Over a year ago

I think this is a display issue, what does df['1%'] show or better df['1%'].iloc[0] show?.

Christopher Over a year ago

pd.options.display.float_format = '{:,.4f}'.format fixed the problem!

|

Collectives™ on Stack Overflow

Python / Pandas Tables

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related