Pandas Dataframe Multiindex Merge

Question

Here is a hypothetical scenario with multiindex dataframes in pandas. Trying to merge them will result in an error. Do I have to do reset_index() on either dataframe to make this work?

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])

s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])

Attempted merges:

s1.merge(s2, how='left', left_index=True, right_index=True)

Editor's note: The error for this one was most likely ValueError: cannot join with no overlapping index names. Tested with Pandas 2.2.3

s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])

Editor's note: It's not clear what error occurred here. If you know, please add it.

This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route. — Scott Boston
– Scott Boston, Commented Oct 12, 2018 at 19:27
What's the error? The second one doesn't error for me, though the resulting index is a default RangeIndex, which is probably not what you want. I'm using Pandas 2.2.3. For the first one I get ValueError: cannot join with no overlapping index names. — wjandrea
– wjandrea, Commented Jul 16 at 17:31
Please seed the RNG so that we have reproducible data, e.g. np.random.seed(0). I might take the initiative and do this myself, and for the answers too. For reference see How to make good reproducible pandas examples. On that note, please also add your expected output. — wjandrea
– wjandrea, Commented Jul 16 at 17:32
Please write a more descriptive title. That might look like "How can I merge dataframes with multiindexes with different names?" See How to Ask for tips on how to write a good title. — wjandrea
– wjandrea, Commented Jul 16 at 17:36
FWIW, with df1.merge(df2, how='left', left_index=True, right_index=True), you might as well just use df1.join(df2) — wjandrea
– wjandrea, Commented Jul 16 at 17:51

ALollz · Accepted Answer · 2018-10-12 19:01:13Z

25

Seems like you need to use a combination of them.

s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])

Output:

               s1        s2
bar one  0.765385 -0.365508
    two  1.462860  0.751862
baz one  0.304163  0.761663
    two -0.816658 -1.810634
foo one  1.891434  1.450081
    two  0.571294  1.116862
qux one  1.056516 -0.052927
    two -0.574916 -1.197596

answered Oct 12, 2018 at 19:01

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Tobias Feil Over a year ago

Why not right_on=s2.index.names?

rafaelc · Accepted Answer · 2018-10-12 19:11:37Z

9

Other than using the indexes names as pointed by @ALollz, you can simply use loc, which will match indexes automatically

s1.loc[:, 's2'] = s2   # Or explicitly, s2['s2']

                s1           s2
first   second      
bar     one     -0.111384   -2.341803
        two     -1.226569    1.308240
baz     one      1.880835    0.697946
        two     -0.008979   -0.247896
foo     one      0.103864   -1.039990
        two      0.836931    0.000811
qux     one     -0.859005   -1.199615
        two     -0.321341   -1.098691

A general formula would be

s1.loc[:, s2.columns] = s2

edited Oct 12, 2018 at 19:11

answered Oct 12, 2018 at 19:06

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Comments

BENY · Accepted Answer · 2018-10-12 19:24:28Z

8

Assign it by combine_first

s1.combine_first(s2)
Out[19]: 
                    s1        s2
first second                    
bar   one     0.039203  0.795963
      two     0.454782 -0.222806
baz   one     3.101120 -0.645474
      two    -1.174929 -0.875561
foo   one    -0.887226  1.078218
      two     1.507546 -1.078564
qux   one     0.028048  0.042462
      two     0.826544 -0.375351

# s2.combine_first(s1)

answered Oct 12, 2018 at 19:24

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

wjandrea · Accepted Answer · 2025-07-16 17:54:47Z

7

`rename_axis`

You can rename the index levels of one and let join do its thing

s1.join(s2.rename_axis(s1.index.names))

                    s1        s2
first second                    
bar   one    -0.696420 -1.040463
      two     0.640891  1.483262
baz   one     1.598837  0.097424
      two     0.003994 -0.948419
foo   one    -0.717401  1.190019
      two    -1.201237 -0.000738
qux   one     0.559684 -0.505640
      two     1.979700  0.186013

`concat`

pd.concat([s1, s2], axis=1)

                    s1        s2
first second                    
bar   one    -0.696420 -1.040463
      two     0.640891  1.483262
baz   one     1.598837  0.097424
      two     0.003994 -0.948419
foo   one    -0.717401  1.190019
      two    -1.201237 -0.000738
qux   one     0.559684 -0.505640
      two     1.979700  0.186013

edited Jul 16 at 17:54

wjandrea

33.9k10 gold badges69 silver badges105 bronze badges

answered Oct 12, 2018 at 19:21

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Collectives™ on Stack Overflow

Pandas Dataframe Multiindex Merge

4 Answers 4

Output:

1 Comment

Comments

Comments

`rename_axis`

`concat`

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Output:

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related