1

I'm reading a bunch of SAS files like so:

demography = pd.read_sas("demography.sas7bdat", encoding = 'latin-1') adverse_event_ds = pd.read_sas("adverse_event_ds.sas7bdat", encoding = 'latin-1') rpt10344 = pd.read_sas("rpt10344.sas7bdat", encoding = 'latin-1') vaccine_administration = pd.read_sas("vaccine_administration.sas7bdat", encoding = 'latin-1') lab_tests_blood_chemistry_ds = pd.read_sas("lab_tests_blood_chemistry_ds.sas7bdat", encoding = 'latin-1') lab_tests_hematology_ds = pd.read_sas("lab_tests_hematology_ds.sas7bdat", encoding = 'latin-1') lab_tests_miscellaneous_ds = pd.read_sas("lab_tests_miscellaneous_ds.sas7bdat", encoding = 'latin-1') vital_signs = pd.read_sas("vital_signs.sas7bdat", encoding = 'latin-1')

I want to be able to replace it with something like this:

datasets = ["demography", "adverse_event_ds", "rpt10344", "vaccine_administration", "lab_tests_blood_chemistry_ds", "lab_tests_hematology_ds", "lab_tests_miscellaneous_ds", "vital_signs"]

for dataset in datasets: dataset = pd.read_sas(dataset+".sas7bdat", encoding = 'latin-1')

But when I do something like: demography.info()

I get: NameError: name 'demography' is not defined

What's happening under the hood and how can I fix this?

1 Answer 1

2

this is assigning to dataset on every iteration rather than creating the new variables (e. g. demography, rpt10344, etc).

i'd use a dataset dictionary as follows:

dsd = {}
for dataset in datasets:
    dsd[dataset] = pd.read_sas(dataset+".sas7bdat", encoding = 'latin-1')

or a more pythonic route:

dsd = { d : pd.read_sas(d + ".sas7bdat", encoding = 'latin-1') for d in datasets }

I'd strongly advise against assigning to individual variable names for reasons explained here and here but if you absolutely must you can use

for d in datasets:
    globals()[d] = pd.read_sas(d + ".sas7bdat", encoding = 'latin-1')
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks! This did help, but how can I have the dataframes assigned directly to the name, rather than having to do something like dsd["demography"]?
It's better practice to have the dataframes in a container like a dictionary: stackoverflow.com/a/6365889/5666087
Thanks for that and the links! This really helped explain why it's a bad idea! Thanks a lot!
no prob! please consider accepting my answer if it worked for you.
@tbdees - I hit a +1 on your answer, does that count as accepting or am I doing it wrong?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.