0

I have numerous dataframes and each dataframe has about 100 different chemical compounds and a categorical variable listing the type of material. For example, a smaller version of my datasets would look something like this:

Decane    Octanal    Material
 1         20         Water
 2         1          Glass
 10        5          Glass
 9         4          Water

I am using a linear regression model to regress the chemicals onto the material type. I want to be able to dynamically rename the results dataframe based on which dataset I am using. My code looks like this (where 'feature_cols' are the names of the chemicals):

count=0
dataframe=[]

#loop through the three datasets (In reality I have many more than three)
for dataset in [first, second, third]:
count+=1


for feature in feature_cols:

    #define the model and fit it
    mod = smf.ols(formula='Q(feature)'+'~material', data=dataset)
    res = mod.fit()
    
    #create a dataframe of the pvalues
    #I would like to be able to dynamically name pvalues so that when looping through
    #the chemicals of the first dataframe it is called 'pvalues_first' and so on.

    pvalues=pd.DataFrame(res.pvalues)
    
1

2 Answers 2

1

You can use a dictionary (here with dummy values) :

names = ['first', 'second', 'third', 'fourth', 'fifth', 'sixth']
pvalues = {}
for i in range(len(names)):
    pvalues["pvalues_" + names[i]] = i+1

print(pvalues)

Output:

{'pvalues_first': 1, 'pvalues_second': 2, 'pvalues_third': 3, 'pvalues_fourth': 4, 'pvalues_fifth': 5, 'pvalues_sixth': 6}

To access pvalues_third for example :

pvalues["pvalues_third"] = 20
print(pvalues)

**Output: **

{'pvalues_first': 1, 'pvalues_second': 2, 'pvalues_third': 20, 'pvalues_fourth': 4, 'pvalues_fifth': 5, 'pvalues_sixth': 6}
Sign up to request clarification or add additional context in comments.

6 Comments

if pvalues is a dictionary then running this line of code: pvalues["pvalues_" + dataset[count]] = pd.DataFrame(res.pvalues) gives key error 1
Your names variable is a list of strings but my equivalent dataset variable is a list of dataframes
In my code above, 'dataset' is what I am looping through to go through my dataframes.
If I run your first code snippet with names being a list of dataframes instead of just a list of strings, it gives me an error. Your code works fine when your looping through strings but not when you are looping through dataframes
first=pd.read_excel('file_path')
|
0
count=0
dataframe=[]

#loop through the three datasets (In reality I have many more than three)
names = ["first", "second", "third"]
for feature in feature_cols:
    #define the model and fit it
    mod = smf.ols(formula='Q(feature)'+'~material', data=dataset)
    res = mod.fit()

    #create a dataframe of the pvalues
    #I would like to be able to dynamically name pvalues so that when looping through
    #the chemicals of the first dataframe it is called 'pvalues_first' and so on.
    name_str = "pvalues"+str(names[count])
    pvalues = {'Intercept':[res.pvalues[0]], 'cap_type':[res.pvalues[1]]}
    name_str=pd.DataFrame(pvalues)
    count+=1

2 Comments

Your names variable is a list of strings but my equivalent dataset variable is a list of dataframes
you can change it to a list by:df.columns.tolist()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.