Pandas name dataframe from a string in csv name

Question

I have several csv with a string in their name (e.g city name) and want to read them in dataframe with the names derived from that city name.

example of csv names: data_paris.csv , data_berlin.csv

How can I read them in a loop to get df_paris and df_berlin?

What I tried so far:

all_files = glob.glob(./*.csv")

for filename in all_files:
    city_name=re.split("[_.]", filename)[1] #to extract city name from filename
    dfname= {'df' + str(city_name)}
    print(dfname)
    dfname= pd.read_csv(filename)

I expect to have df_rome and df_paris, but I get just dfname. Why?

A related question: Name a dataframe based on csv file name?

Thank you!

Instead of df_paris and df_berlin, you should create a dictionary dfs with keys 'paris' and 'berlin', so you can do dfs['paris'] and dfs['berlin']. — Quang Hoang
– Quang Hoang, Commented Jul 31, 2020 at 18:12

Quang Hoang · Accepted Answer · 2020-07-31 18:21:41Z

1

I would recommend against automatic dynamic naming like df_paris, df_berlin. Instead, you should do:

all_files = glob.glob("./*.csv")

# dictionary of dataframes
dfs = dict()
for filename in all_files:
    city_name=re.split("[_.]", filename)[1] # to extract city name from filename

    dfs[city_name] =  pd.read_csv(filename) # assign to the dataframe dictionary

answered Jul 31, 2020 at 18:21

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sushanth · Accepted Answer · 2020-07-31 18:24:40Z

1

You are mixing your concepts. If you want to reference dynamically data frames that have been loaded use a dict

all_files = glob.glob("./*.csv")

dfname={}
                      
for filename in all_files:
    city_name=re.split("[_.]", filename)[1] #to extract city name from filename
    dfname['df' + str(city_name)] = pd.read_csv(filename)
print(list(dfname.keys())

edited Jul 31, 2020 at 18:24

sushanth

8,2923 gold badges20 silver badges31 bronze badges

answered Jul 31, 2020 at 18:24

Rob Raymond

31.5k3 gold badges19 silver badges34 bronze badges

2 Comments

Rob Raymond Over a year ago

I can't see what you refer to.... how much coding have you done with dict and comprehensions? you can always then reference a loaded df as dfname["dfparis"]. The point of the print() was to show this. Do you know a dict is a dynamic structure of key/value pairs?

physiker Over a year ago

yes I know dict! I just had not used/seen yet dataframe as dict.

sushanth · Accepted Answer · 2020-07-31 18:24:57Z

0

the only dataframe you're creating is "dfname." You just keep overwriting that each time you loop through. I guess you could do this using globals(), though honestly I'd probably just create a list or a dict of dataframes (as it seems others have suggested while I was typing this), or else create a named column for 'city' in a master dataframe that I just keep appending to. But, keeping with what you're specifically asking, you could probably do it like so:

all_files = glob.glob("./*.csv")

for filename in all_files:
    globals()[filename[5:-4]]=  pd.read_csv(filename)

edited Jul 31, 2020 at 18:24

sushanth

8,2923 gold badges20 silver badges31 bronze badges

answered Jul 31, 2020 at 18:23

M00NSH0T

464 bronze badges

Collectives™ on Stack Overflow

Pandas name dataframe from a string in csv name

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest