Map pandas dataframe on multiple keys as columns or multiIndex

Question

Setup: two pandas dataframes; data from df2 needs to be added to df1, as explained below:

df1 and df2 are multiIndexed with the same four levels
df1 contains more rows than df2
df1 has three copies (in rows) of a value per unique combination of three out of the four levels of the index; that is, each row differs only with respect to the 4th level
df2 only partially aligns with df1 on the other 3 levels (df2 contains extraneous rows)
df2 contains only one column

I want to add values from the one column of df2 to all three copies of the rows in df1 where the three corresponding levels match.

Having learned that 'merging with more than one level overlap on a multiIndex is not implemented' in pandas, I propose to map the values, but have not found a way to map on (multiple) index levels, or multiple columns, if reset index levels to columns:

df1 = pd.DataFrame(np.array([['Dec', 'NY', 'Ren', 'Q1', 10],
   ['Dec', 'NY', 'Ren', 'Q2', 12],
   ['Dec', 'NY', 'Ren', 'Q3', 14],
   ['Dec', 'FL', 'Mia', 'Q1', 6],
   ['Dec', 'FL', 'Mia', 'Q2', 8],
   ['Dec', 'FL', 'Mia', 'Q3', 17],
   ['Apr', 'CA', 'SC', 'Q1', 1],
   ['Apr', 'CA', 'SC', 'Q2', 2],
   ['Apr', 'CA', 'SC', 'Q3', 3]]), columns=['Date', 'State', 'County', 'Quarter', 'x'])

df1.set_index(['Date', 'State', 'County', 'Quarter'], inplace=True)

df2 = pd.DataFrame(np.array([['Dec', 'NY', 'Ren', 0.4],
   ['Dec', 'FL', 'Mia', 0.3]]), columns=['Date', 'State', 'County', 'y'])

df2.set_index(['Date', 'State', 'County', 'y'], inplace=True)

df_combined = df1['Date', 'State', 'County'].map(df2)

Perhaps you could post an example? Sometimes, just the output of df1.head(20) or something like that. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Aug 26, 2016 at 20:34

khammel · Accepted Answer · 2016-08-27 17:26:18Z

You can temporarily change df1 to change the index to do the join:

df_combined = df1.reset_index(3).join(df2,how='left')

>>> df_combined
           level_3   x    y
Apr CA SC       Q1   1  NaN
       SC       Q2   2  NaN
       SC       Q3   3  NaN
Dec FL Mia      Q1   6  0.3
       Mia      Q2   8  0.3
       Mia      Q3  17  0.3
    NY Ren      Q1  10  0.4
       Ren      Q2  12  0.4
       Ren      Q3  14  0.4

df_combined.set_index('level_3',append=True, inplace=True)
df_combined.index.rename(None,3,inplace=True)

>>> df_combined
                x    y
Apr CA SC  Q1   1  NaN
           Q2   2  NaN
           Q3   3  NaN
Dec FL Mia Q1   6  0.3
           Q2   8  0.3
           Q3  17  0.3
    NY Ren Q1  10  0.4
           Q2  12  0.4
           Q3  14  0.4

The reset_index method is used to temporarily turn the index that isn't in df2 into a column so that you can do a normal join. Then turn the column back into an index when you're done.

Collectives™ on Stack Overflow

Map pandas dataframe on multiple keys as columns or multiIndex

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related