1

I have more than 20 data frame in my current database. And I want to concatenate all these data frame. However, my current code is too bad. First I generate dictionary and Based on this dict, I could generate 20 data frame. Here is my code

import copy

dict_of_df = {}

file_dir = '../VentureXpert/Raw/county_csv/'

for year in range(1996,2020):
    key_name = 'df_'+str(year)
    df = pd.read_csv(file_dir + str(year)+'.csv',header=None)
    df.columns =['st-county' ,'numb of comp','pct of comp','inv sum($mil)','pct of inv','avg per comp','med per comp','year']
    dict_of_df[key_name] = copy.deepcopy(df)

Here is how I generate 20 dataframe

df_1996 = dict_of_df['df_1996']
df_1997 = dict_of_df['df_1997']
df_1998 = dict_of_df['df_1998']
df_1999 = dict_of_df['df_1999']

df_2000 = dict_of_df['df_2000']
df_2001 = dict_of_df['df_2001']
df_2002 = dict_of_df['df_2002']
df_2003 = dict_of_df['df_2003']
df_2004 = dict_of_df['df_2004']
df_2005 = dict_of_df['df_2005']
df_2006 = dict_of_df['df_2006']
df_2007 = dict_of_df['df_2007']
df_2008 = dict_of_df['df_2008']
df_2009 = dict_of_df['df_2009']
df_2010 = dict_of_df['df_2010']

df_2011 = dict_of_df['df_2011']
df_2012 = dict_of_df['df_2012']
df_2013 = dict_of_df['df_2013']
df_2014 = dict_of_df['df_2014']
df_2015 = dict_of_df['df_2015']
df_2016 = dict_of_df['df_2016']
df_2017 = dict_of_df['df_2017']
df_2018 = dict_of_df['df_2018']
df_2019 = dict_of_df['df_2019']

and I did concat method to merge these 20 dataframe.

df_final = pd.concat([df_1996,df_1997,df_1998,df_1999,df_2000,df_2001,df_2002,df_2003,df_2004,df_2005,df_2006,df_2007,df_2008,df_2009,df_2010,df_2011,df_2012,df_2013,df_2014,df_2015,df_2016,df_2017,df_2018,df_2019], ignore_index=True)

Is there any other easy way to do this?

I want to use for loop do this.

Thanks in advance

4
  • Problem is in code before, how is generated df_1996,df_1997,df_1998,df_1999,df_2000,.. ? Why not generate list of DataFrames or dict? Commented Oct 5, 2021 at 10:01
  • Thanks for comments. I have dict like this "dict_of_df". Thus, if I typed dict_of_df['df_1996'], I could generate df_1996 data frame Commented Oct 5, 2021 at 10:04
  • So is possible use df_final = pd.concat(dict_of_df, ignore_index=True) ? Commented Oct 5, 2021 at 10:06
  • Yes, I edit my full code. thanks Commented Oct 5, 2021 at 10:12

1 Answer 1

1

Problem is with string variables df_1996,df_1997,df_1998,df_1999,df_2000, in python are not recommended.

If generate dict of DataFrames dict_of_df then solution is simplify a lot:

df_final = pd.concat(dict_of_df.values(), ignore_index=True)

If need also generate new column by keys:

df_final = pd.concat(dict_of_df).reset_index(level=1, drop=True).reset_index()
Sign up to request clarification or add additional context in comments.

4 Comments

also, is there any way to generate my 20 data frame more easily?
@DAEHYUNKIM - Not understand. why df_1996 = dict_of_df['df_1996']... ?
@DAEHYUNKIM - Why is necessary assign to 20 variables? What is reason?
@DAEHYUNKIM - because if need working with it, why not use dict of DataFrames, already created?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.