Setup: two pandas dataframes; data from df2 needs to be added to df1, as explained below:
- df1 and df2 are multiIndexed with the same four levels
- df1 contains more rows than df2
- df1 has three copies (in rows) of a value per unique combination of three out of the four levels of the index; that is, each row differs only with respect to the 4th level
- df2 only partially aligns with df1 on the other 3 levels (df2 contains extraneous rows)
- df2 contains only one column
I want to add values from the one column of df2 to all three copies of the rows in df1 where the three corresponding levels match.
Having learned that 'merging with more than one level overlap on a multiIndex is not implemented' in pandas, I propose to map the values, but have not found a way to map on (multiple) index levels, or multiple columns, if reset index levels to columns:
df1 = pd.DataFrame(np.array([['Dec', 'NY', 'Ren', 'Q1', 10],
['Dec', 'NY', 'Ren', 'Q2', 12],
['Dec', 'NY', 'Ren', 'Q3', 14],
['Dec', 'FL', 'Mia', 'Q1', 6],
['Dec', 'FL', 'Mia', 'Q2', 8],
['Dec', 'FL', 'Mia', 'Q3', 17],
['Apr', 'CA', 'SC', 'Q1', 1],
['Apr', 'CA', 'SC', 'Q2', 2],
['Apr', 'CA', 'SC', 'Q3', 3]]), columns=['Date', 'State', 'County', 'Quarter', 'x'])
df1.set_index(['Date', 'State', 'County', 'Quarter'], inplace=True)
df2 = pd.DataFrame(np.array([['Dec', 'NY', 'Ren', 0.4],
['Dec', 'FL', 'Mia', 0.3]]), columns=['Date', 'State', 'County', 'y'])
df2.set_index(['Date', 'State', 'County', 'y'], inplace=True)
df_combined = df1['Date', 'State', 'County'].map(df2)