How to call different data files using a for loop in pandas?

Question

I have a list of files named such as

topaccount_2015_09_individuals
topaccount_2015_12_indiviuuals
...
topaccount_2021_12_individuals

which are subsets of

topaccount_2015_09
topaccount_2015_12
...
topaccount_2021_12

I want to call them and do some data manipulation so i created a list,

known_series = known['Address']
y = ['2015_09', '2015_12', '2016_03', '2016_06', '2016_09', '2016_12', '2017_03', '2017_06', '2017_09', '2017_12',
     '2018_03', '2018_06', '2018_09', '2018_12', '2019_03', '2019_06', '2019_09', '2019_12' , '2020_03', '2020_06', '2020_09', '2020_12', 
     '2021_03', '2021_03', '2021_06', '2021_09', '2021_12']

for q in y:
    topaccount_[q]_individuals = topaccount_[q][~topaccount_[q]['address'].isin(known_series)]
    topaccount_[q]_individuals = topaccount_[q]_individuals.reset_index(drop=True)

but it is giving me an error. what am I doing wrong? (known_series is already defined in the script)

UPDATE I followed the suggestion below, but i have one more problem, which is how to address the master dataframe from which I am extracting _individuals dataframes.

y = ['2015_09', '2015_12', '2016_03', '2016_06', '2016_09', '2016_12', '2017_03', '2017_06', '2017_09', '2017_12',
     '2018_03', '2018_06', '2018_09', '2018_12', '2019_03', '2019_06', '2019_09', '2019_12' , '2020_03', '2020_06', '2020_09', '2020_12', 
     '2021_03', '2021_03', '2021_06', '2021_09', '2021_12']

file_individuals = []
file = []

for x in y:
    file_individuals.append(f'topaccount_{x}_individuals')
    file.append(f'topaccount_{x}')
    
print(file_individuals)
print(file)
    
for file_individuals in file_individuals:
    file_individuals = **topaccount_[q][~topaccount_[q]**['address'].isin(known_series)]  
    file_individuals = file_individuals[~file_individuals['address'].isin(coinmarketcap_series)]
    file_individuals = file_individuals[~file_individuals['address'].isin(tord_series)]
    file_individuals = file_individuals[~file_individuals['address'].isin(exchanges_series)]
    file_individuals = file_individuals.reset_index(drop=True)

REUPDATE

d = {}
names=[]
for x in y:
     d['ind'] = f"topaccount_{x}_individuals"
     d['top'] = f"topaccount_{x}"
     names.append(d)
    
for n in names:
    n['ind'] = n['top'][~n['top']['address'].isin(known_series)]

and I get the following error:

   n['ind'] = n['top'][~n['top']['address'].isin(known_series)]

TypeError: string indices must be integers

Maybe known_series is already defined but your example is not reproducible. You can't create variables dynamically like that topaccount_[q]_individuals. — Corralien
– Corralien, Commented Mar 21, 2022 at 9:42
No you can use a dictionary topaccount_individuals indexed by y — Corralien
– Corralien, Commented Mar 21, 2022 at 9:52
exactly. I am not sure how to address them since their date have to match. (i.e. extracting top_account_2012_12_individuals from top_account_2012_12 — Olive
– Olive, Commented Mar 21, 2022 at 10:22

Devang Sanghani · Accepted Answer · 2022-03-21 10:29:05Z

1

Something like this then and use the name list later.

name = []
for x in y:
    name.append(f'topaccount_{x}_individuals') 
    
print(name)

['topaccount_2015_09_individuals', 'topaccount_2015_12_individuals', 'topaccount_2016_03_individuals', 'topaccount_2016_06_individuals', 'topaccount_2016_09_individuals', 'topaccount_2016_12_individuals', 'topaccount_2017_03_individuals', 'topaccount_2017_06_individuals', 'topaccount_2017_09_individuals', 'topaccount_2017_12_individuals', 'topaccount_2018_03_individuals', 'topaccount_2018_06_individuals', 'topaccount_2018_09_individuals', 'topaccount_2018_12_individuals', 'topaccount_2019_03_individuals', 'topaccount_2019_06_individuals', 'topaccount_2019_09_individuals', 'topaccount_2019_12_individuals', 'topaccount_2020_03_individuals', 'topaccount_2020_06_individuals', 'topaccount_2020_09_individuals', 'topaccount_2020_12_individuals', 'topaccount_2021_03_individuals', 'topaccount_2021_03_individuals', 'topaccount_2021_06_individuals', 'topaccount_2021_09_individuals', 'topaccount_2021_12_individuals']

Alternatively,

d = {}
names=[]
for x in y:
     d['ind'] = f"topaccount_{x}_individuals"
     d['top'] = f"topaccount_{x}"
     names.append(d)

for n in names:
    n['ind'] = n['top'].....

edited Mar 21, 2022 at 10:29

answered Mar 21, 2022 at 9:51

Devang Sanghani

7706 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Olive Over a year ago

The very first line is calling data from the master file, "topaccount_2015_12[~topaccount_2015_12['address'].isin(known_series)]" how can I approach this?? do I specify a loop inside a loop??

Devang Sanghani Over a year ago

You can create these file names before and then use them in the code.

Olive Over a year ago

yes i understand that, but these are two different sets of files. code y = ['2015_09', '2015_12', '2016_03', '2016_06', '2016_09', '2016_12', '2017_03', '2017_06', '2017_09', '2017_12', '2018_03', '2018_06', '2018_09', '2018_12', '2019_03', '2019_06', '2019_09', '2019_12' , '2020_03', '2020_06', '2020_09', '2020_12', '2021_03', '2021_03', '2021_06', '2021_09', '2021_12'] file_individuals = [] file = [] for x in y: file_individuals.append(f'topaccount_{x}_individuals') file.append(f'topaccount_{x}')

Collectives™ on Stack Overflow

How to call different data files using a for loop in pandas?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related