0

I have several csv with a string in their name (e.g city name) and want to read them in dataframe with the names derived from that city name.

example of csv names: data_paris.csv , data_berlin.csv

How can I read them in a loop to get df_paris and df_berlin?

What I tried so far:

all_files = glob.glob(./*.csv")

for filename in all_files:
    city_name=re.split("[_.]", filename)[1] #to extract city name from filename
    dfname= {'df' + str(city_name)}
    print(dfname)
    dfname= pd.read_csv(filename)

I expect to have df_rome and df_paris, but I get just dfname. Why?

A related question: Name a dataframe based on csv file name?

Thank you!

2
  • Instead of df_paris and df_berlin, you should create a dictionary dfs with keys 'paris' and 'berlin', so you can do dfs['paris'] and dfs['berlin']. Commented Jul 31, 2020 at 18:12
  • could you write as an answer and more detailed? thanks! Commented Jul 31, 2020 at 18:18

3 Answers 3

1

I would recommend against automatic dynamic naming like df_paris, df_berlin. Instead, you should do:

all_files = glob.glob("./*.csv")

# dictionary of dataframes
dfs = dict()
for filename in all_files:
    city_name=re.split("[_.]", filename)[1] # to extract city name from filename

    dfs[city_name] =  pd.read_csv(filename) # assign to the dataframe dictionary
Sign up to request clarification or add additional context in comments.

Comments

1

You are mixing your concepts. If you want to reference dynamically data frames that have been loaded use a dict

all_files = glob.glob("./*.csv")

dfname={}
                      
for filename in all_files:
    city_name=re.split("[_.]", filename)[1] #to extract city name from filename
    dfname['df' + str(city_name)] = pd.read_csv(filename)
print(list(dfname.keys())

2 Comments

I can't see what you refer to.... how much coding have you done with dict and comprehensions? you can always then reference a loaded df as dfname["dfparis"]. The point of the print() was to show this. Do you know a dict is a dynamic structure of key/value pairs?
yes I know dict! I just had not used/seen yet dataframe as dict.
0

the only dataframe you're creating is "dfname." You just keep overwriting that each time you loop through. I guess you could do this using globals(), though honestly I'd probably just create a list or a dict of dataframes (as it seems others have suggested while I was typing this), or else create a named column for 'city' in a master dataframe that I just keep appending to. But, keeping with what you're specifically asking, you could probably do it like so:

all_files = glob.glob("./*.csv")

for filename in all_files:
    globals()[filename[5:-4]]=  pd.read_csv(filename)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.