Trying to convert a regression program from stata to python

Question

I currently have this do file in stata which is a simple test for significance in a matched pairs regression. I understand some basic python but I did not know if something like this is possible in python given my limited knowledge. I am using this for my uncle who is using python for his company. If anyone can guide me to some resources or explain how I would do this please let me know.

*import delimited "data"

drop if missing(v1,v2,v3)

regress v3 v2

test v2

generate pvalue = r(p)

if pvalue > .01 {
display "notsig"
display pvalue
}

if pvalue <= .01 {
display "sig"
display pvalue
}

drop pvalue

The variable pvalue is not needed as you can condition on r(p) The test is given in the regress output any way. — Nick Cox
– Nick Cox, Commented Jan 26, 2018 at 7:54

Colton T · Accepted Answer · 2018-04-10 14:41:03Z

1

I would look into pandas (http://pandas.pydata.org/pandas-docs/stable/) and statsmodels (http://www.statsmodels.org/dev/index.html). Pandas is good for reading data into dataframes in python, and then you can run statistical models with statsmodels. I am not well-versed in statsmodels, so you may have to look into the documentation yourself.

Here is an example, to try and go along with what you showed in your question:

import pandas as pd
import statsmodels.formula.api as sm

df = pd.read_csv("data.csv", sep=",")
df.dropna(axis=0, how='any')

results = sm.ols(formula="v3~v2", data=df).fit()
t_test = results.t_test('v2=0')

if (t_test.pvalue*2) > 0.01:
  print("notsig")
  print(t_test.pvalue*2)

if (t_test.pvalue*2) <= 0.01:
  print("sig")
  print(t_test.pvalue*2)

I took the pvalue*2 in this example, because I believe that it only gives the one-tail p-value, but you should check the documentation to make sure.

edited Apr 10, 2018 at 14:41

answered Jan 25, 2018 at 20:48

Colton T

3382 gold badges4 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Josef Over a year ago

tvalues and pvalues for testing that parameter is zero are directly available in the results instance, t_test is more general and provides the same results.

Josef Over a year ago

The pvalue is for two-sided hypothesis, the alternative is no equal, so the *2 needs to be removed. (Currently the test in the model results are always two-sided, only the standalone t_tests for means allow for one-sided alternatives.)

CPBL Over a year ago

statsmodels is misspelled (without an "l")

Colton T Over a year ago

@CPBL Thank you, I edited it to show the correct spelling.

Collectives™ on Stack Overflow

Trying to convert a regression program from stata to python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related