2

First: thank you for the great help so far! I have a question on working with table formatting in iPython.

I currently run this script to print the Augmented Dickey-Fuller (ADF) Test for Stationarity:

print "Stationarity"
print sm.tsa.stattools.adfuller(df['temperature'], maxlag=None, autolag='BIC', regression='c')

The output is something like this:

Stationarity
(-6.4532219513246361, 1.5054094590984612e-08, 0, 41, {'5%': -2.9351348158036012, '1%': -3.6009833671885199, '10%': -2.6059629803688282}, 1227.2605520289471*)

*(not sure what this value is Link to Documentation)

Now, my questions are:

  1. How can I automate the calculation for more than one variable? Is it possible to create a list containing the different columns (df['variable1'], df['variable1'], df['variable1'], df['variable1'], ...) that applies the ADF test for each item?

  2. How can I put the returning data into a table structure? Something like this:

ADF Test

Variable      nobs  t-test  p-value                     1%       5%      10%
temperature     41  6.4532  1.5054094590984612e-08  -3.600  -2.9351  -2.6059
variable 2     ...
variable 3     ...

(By the way: How to convert "1.5054094590984612e-08" into an accurate number?)

Thanks for your support!

7
  • 1
    Can you post raw input data, as for your first question I think this should work, if col_list is a list of your variables then df[col_list].apply(sm.tsa.stattools.adfuller, maxlag=None, autolag='BIC', regression='c') you can assign these to new columns and then transpose if you want the columns as index values. Commented Apr 30, 2015 at 8:39
  • Cool - Questions #1 seems to be solved now! I implemented the script and it works. Can you help me on the second problem? Commented Apr 30, 2015 at 12:58
  • Can you some raw data or a link to your data, I have no clue what inputs this is expecting Commented Apr 30, 2015 at 13:04
  • As input, you basically only need a list of values (time series). I can't disclose the data here, but imagine it's a list of 1000+ values between 1 and 30. Commented Apr 30, 2015 at 15:23
  • So just a datetimeindex and random values? Commented Apr 30, 2015 at 15:44

1 Answer 1

2

So I basically knocked up some dummy data, basically I build a dict for each col to store the adf test results and then construct a df for each result:

In [12]:

df = pd.DataFrame(index = pd.date_range(start=dt.datetime(2014,1,1), end = dt.datetime(2014,6,1)))
import statsmodels.tsa.stattools as ts
df['a'] = np.random.randint(0,30,len(df.index))
df['b'] = np.random.randint(0,30,len(df.index))
​
result={}
for col in df:
    result[col] = ts.adfuller(df[col], maxlag=None, autolag='BIC', regression='c')
result
Out[12]:
{'a': (-14.5378299332063,
  5.2041541962613174e-27,
  0,
  151,
  {'1%': -3.4744158894942156,
   '10%': -2.5770812758212358,
   '5%': -2.8808783827710589},
  983.29106640612281),
 'b': (-12.247140023284922,
  9.7254933298555022e-23,
  0,
  151,
  {'1%': -3.4744158894942156,
   '10%': -2.5770812758212358,
   '5%': -2.8808783827710589},
  983.89321857804237)}
In [29]:

df_result = pd.DataFrame()
​
for k,v in result.items():
    df_result = df_result.append(pd.DataFrame(
            data={'nobs':v[3], 't-test':v[0], 'p-value':v[1], '1%':v[4]['1%'], '5%':v[4]['5%'], '10%':v[4]['10%']},
            index=[k]))
df_result.index.name = 'temperature'
df_result
Out[29]:
                   1%       10%        5%  nobs       p-value    t-test
temperature                                                            
a           -3.474416 -2.577081 -2.880878   151  5.204154e-27 -14.53783
b           -3.474416 -2.577081 -2.880878   151  9.725493e-23 -12.24714
Sign up to request clarification or add additional context in comments.

9 Comments

It seems to work for except that it rounds up all items. "-3.474416" is "-3", "-2.577081" is "-2"...
That could be a display issue, what happens when you print result like in my code above?
Those values are correct (16 digits after the decimal). I have no idea why the values in the df_results table are rounded
I think this is a display issue, what does df['1%'] show or better df['1%'].iloc[0] show?.
pd.options.display.float_format = '{:,.4f}'.format fixed the problem!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.