2

I'm relatively new to python, but have been working on a pet project looking at housing markets. Effectively I have a series of files that I've read in and stored as variables with the raw format as follows:

**
cityrentalprice1BR.columns

Index([u'RegionName', u'State', u'Metro', u'CountyName', u'SizeRank',
   u'2010-02', u'2010-03', u'2010-04', u'2010-05', u'2010-06',
   ...
   u'2017-09', u'2017-10', u'2017-11', u'2017-12', u'2018-01', u'2018-02',
   u'2018-03', u'2018-04', u'2018-05', u'2018-06'],
  dtype='object', length=106)**

The "wide" formatting is not particularly helpful for my exercise so I've used the melt function to have a better timeseries by city and the results are great on single dataframe:

cityrentalprice1BR_melt = pd.melt(cityrentalprice1BR, id_vars=['RegionName', 'State', 'Metro', 'CountyName', 'SizeRank'], 
            value_vars = dates, 
            value_name = 'Value',
            var_name =['Date'])


cityrentalprice1BR_melt['Date'] = pd.to_datetime(cityrentalprice1BR_melt['Date'], format='%Y-%m')

My problem arises in that I have multiple files to import with the same formatting and I'd prefer to not manually melt each dataframe. One big caveat is I'd prefer to have the results pushed into new unique dataframes (i.e. in the example above cityrentalprice1BR > cityrentalprice1BR_melt).

I've been working on this for a while and have created a list of dataframes and can apply the melt function to the entire list as follows, however it is missing my key goal of retaining separate dataframes (or column sets per dataframe if a merge function is more appropriate):

Rental = [cityrentalprice1BR, cityrentalprice2BR, cityrentalprice3BR, cityrentalprice4BR, cityrentalprice5BR, cityrentalpriceCondo, cityrentalpriceDupTri]
for size in Rental:
transformrent = pd.melt(size, id_vars=['RegionName', 'State', 'Metro', 'CountyName', 'SizeRank'], 
            value_vars = dates, 
            value_name = 'Value',
            var_name =['Date'])

Any guidance would be much appreciated.

1 Answer 1

8

Often, I have advised in Pandas and R, to always use a single collection of many similar objects rather than flood your global environment with separate variables. This allows you to manage one large object instead of many of whose names you may forget with an open-ended count. Imagine navigating through 10,000 data frames! Better to navigate 10,000 indices of one list or 10,000 keys of one dictionary.

You lose zero functionality of the DataFrame if you store elements within a list, dictionary, even named tuple. Plus if ever you need to run operations across all, you have a container for such operations. Below shows building of various collections of melted data frames (using a defined function) with demonstration of DataFrame methods even bulk operations like pd.concat.

Function

def melt_data(df):   
     mdf = pd.melt(df, id_vars=['RegionName', 'State', 'Metro', 'CountyName', 'SizeRank'], 
                   value_vars = dates, value_name = 'Value', var_name = 'Date')
     return mdf

List

Rental_list = [cityrentalprice1BR, cityrentalprice2BR, cityrentalprice3BR, cityrentalprice4BR, 
               cityrentalprice5BR, cityrentalpriceCondo, cityrentalpriceDupTri]

# LIST COMPREHENSION
df_list = [melt_data(size) for size in Rental_list]

df_list[0].shape()
df_list[1].head()
df_list[2].tail()
df_list[3].sum()
df_list[4].mean()

# APPEND ALL
master_df = pd.concat(df_list, ignore_index)

Dictionary

Rental_dict = {'1BR': cityrentalprice1BR, '2BR': cityrentalprice2BR, '3BR': cityrentalprice3BR, 
               '4BR': cityrentalprice4BR, '5BR': cityrentalprice5BR,
               'Condo': cityrentalpriceCondo, 'DupTri': cityrentalpriceDupTri}

# DICTIONARY COMPREHENSION
df_dict = {i+'_melt': melt_data(size) for i, size in Rental_dict.items()}

df_dict['1BR_melt'].shape()
df_dict['2BR_melt'].head()
df_dict['3BR_melt'].tail()
df_dict['Condo_melt'].sum()
df_dict['DupTri_melt'].mean()

# APPEND ALL
master_df = pd.concat(df_dict, ignore_index)

Named tuple

from collections import namedtuple

melt_dfs = namedtuple('melt_dfs', 'OneBR TwoBR ThreeBR FourBR FiveBR Condo DupTri')

df_tuple = melt_dfs(*[melt_data(size) for size in Rental_list])
# df_tuple = melt_dfs(*[melt_data(size) for i, size in Rental_dict.items()])

df_tuple.OneBR.shape()
df_tuple.TwoBR.head()
df_tuple.ThreeBR.tail()
df_tuple.FourthBR.sum()
df_tuple.FourthBR.mean()

# APPEND ALL
master_df = pd.concat(df_tuple, ignore_index=True)

Separate Variables :(

Nonetheless, if you really, really need separate objects, simply unpack from list of melted dataframes:

cityrentalprice1BR_melt, cityrentalprice2BR_melt, cityrentalprice3BR_melt, \
cityrentalprice4BR_melt, cityrentalprice5BR_melt, \
cityrentalpriceCondo_melt, cityrentalpriceDupTri_melt = [melt_data(size) for size in Rental_list]
Sign up to request clarification or add additional context in comments.

2 Comments

This is fantastic thank you for the time and detail. I've not used dictionaries at all yet, but will go through each example you've laid out to see the best fit.
DICTIONARY COMPREHENSION solution is awesome. Thanks!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.