Apply a function to multiple dataframes, return multiple DF's as output

Question

I'm relatively new to python, but have been working on a pet project looking at housing markets. Effectively I have a series of files that I've read in and stored as variables with the raw format as follows:

**
cityrentalprice1BR.columns

Index([u'RegionName', u'State', u'Metro', u'CountyName', u'SizeRank',
   u'2010-02', u'2010-03', u'2010-04', u'2010-05', u'2010-06',
   ...
   u'2017-09', u'2017-10', u'2017-11', u'2017-12', u'2018-01', u'2018-02',
   u'2018-03', u'2018-04', u'2018-05', u'2018-06'],
  dtype='object', length=106)**

The "wide" formatting is not particularly helpful for my exercise so I've used the melt function to have a better timeseries by city and the results are great on single dataframe:

cityrentalprice1BR_melt = pd.melt(cityrentalprice1BR, id_vars=['RegionName', 'State', 'Metro', 'CountyName', 'SizeRank'], 
            value_vars = dates, 
            value_name = 'Value',
            var_name =['Date'])


cityrentalprice1BR_melt['Date'] = pd.to_datetime(cityrentalprice1BR_melt['Date'], format='%Y-%m')

My problem arises in that I have multiple files to import with the same formatting and I'd prefer to not manually melt each dataframe. One big caveat is I'd prefer to have the results pushed into new unique dataframes (i.e. in the example above cityrentalprice1BR > cityrentalprice1BR_melt).

I've been working on this for a while and have created a list of dataframes and can apply the melt function to the entire list as follows, however it is missing my key goal of retaining separate dataframes (or column sets per dataframe if a merge function is more appropriate):

Rental = [cityrentalprice1BR, cityrentalprice2BR, cityrentalprice3BR, cityrentalprice4BR, cityrentalprice5BR, cityrentalpriceCondo, cityrentalpriceDupTri]
for size in Rental:
transformrent = pd.melt(size, id_vars=['RegionName', 'State', 'Metro', 'CountyName', 'SizeRank'], 
            value_vars = dates, 
            value_name = 'Value',
            var_name =['Date'])

Any guidance would be much appreciated.

Parfait · Accepted Answer · 2018-08-14 19:41:46Z

Often, I have advised in Pandas and R, to always use a single collection of many similar objects rather than flood your global environment with separate variables. This allows you to manage one large object instead of many of whose names you may forget with an open-ended count. Imagine navigating through 10,000 data frames! Better to navigate 10,000 indices of one list or 10,000 keys of one dictionary.

You lose zero functionality of the DataFrame if you store elements within a list, dictionary, even named tuple. Plus if ever you need to run operations across all, you have a container for such operations. Below shows building of various collections of melted data frames (using a defined function) with demonstration of DataFrame methods even bulk operations like pd.concat.

Function

def melt_data(df):   
     mdf = pd.melt(df, id_vars=['RegionName', 'State', 'Metro', 'CountyName', 'SizeRank'], 
                   value_vars = dates, value_name = 'Value', var_name = 'Date')
     return mdf

List

Rental_list = [cityrentalprice1BR, cityrentalprice2BR, cityrentalprice3BR, cityrentalprice4BR, 
               cityrentalprice5BR, cityrentalpriceCondo, cityrentalpriceDupTri]

# LIST COMPREHENSION
df_list = [melt_data(size) for size in Rental_list]

df_list[0].shape()
df_list[1].head()
df_list[2].tail()
df_list[3].sum()
df_list[4].mean()

# APPEND ALL
master_df = pd.concat(df_list, ignore_index)

Dictionary

Rental_dict = {'1BR': cityrentalprice1BR, '2BR': cityrentalprice2BR, '3BR': cityrentalprice3BR, 
               '4BR': cityrentalprice4BR, '5BR': cityrentalprice5BR,
               'Condo': cityrentalpriceCondo, 'DupTri': cityrentalpriceDupTri}

# DICTIONARY COMPREHENSION
df_dict = {i+'_melt': melt_data(size) for i, size in Rental_dict.items()}

df_dict['1BR_melt'].shape()
df_dict['2BR_melt'].head()
df_dict['3BR_melt'].tail()
df_dict['Condo_melt'].sum()
df_dict['DupTri_melt'].mean()

# APPEND ALL
master_df = pd.concat(df_dict, ignore_index)

Named tuple

from collections import namedtuple

melt_dfs = namedtuple('melt_dfs', 'OneBR TwoBR ThreeBR FourBR FiveBR Condo DupTri')

df_tuple = melt_dfs(*[melt_data(size) for size in Rental_list])
# df_tuple = melt_dfs(*[melt_data(size) for i, size in Rental_dict.items()])

df_tuple.OneBR.shape()
df_tuple.TwoBR.head()
df_tuple.ThreeBR.tail()
df_tuple.FourthBR.sum()
df_tuple.FourthBR.mean()

# APPEND ALL
master_df = pd.concat(df_tuple, ignore_index=True)

Separate Variables :(

Nonetheless, if you really, really need separate objects, simply unpack from list of melted dataframes:

cityrentalprice1BR_melt, cityrentalprice2BR_melt, cityrentalprice3BR_melt, \
cityrentalprice4BR_melt, cityrentalprice5BR_melt, \
cityrentalpriceCondo_melt, cityrentalpriceDupTri_melt = [melt_data(size) for size in Rental_list]

This is fantastic thank you for the time and detail. I've not used dictionaries at all yet, but will go through each example you've laid out to see the best fit.

Collectives™ on Stack Overflow

Apply a function to multiple dataframes, return multiple DF's as output

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related