25

Here is a hypothetical scenario with multiindex dataframes in pandas. Trying to merge them will result in an error. Do I have to do reset_index() on either dataframe to make this work?

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])

s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])

Attempted merges:

s1.merge(s2, how='left', left_index=True, right_index=True)
  • Editor's note: The error for this one was most likely ValueError: cannot join with no overlapping index names. Tested with Pandas 2.2.3
s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])
  • Editor's note: It's not clear what error occurred here. If you know, please add it.
5
  • 1
    This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route. Commented Oct 12, 2018 at 19:27
  • What's the error? The second one doesn't error for me, though the resulting index is a default RangeIndex, which is probably not what you want. I'm using Pandas 2.2.3. For the first one I get ValueError: cannot join with no overlapping index names. Commented Jul 16 at 17:31
  • Please seed the RNG so that we have reproducible data, e.g. np.random.seed(0). I might take the initiative and do this myself, and for the answers too. For reference see How to make good reproducible pandas examples. On that note, please also add your expected output. Commented Jul 16 at 17:32
  • Please write a more descriptive title. That might look like "How can I merge dataframes with multiindexes with different names?" See How to Ask for tips on how to write a good title. Commented Jul 16 at 17:36
  • FWIW, with df1.merge(df2, how='left', left_index=True, right_index=True), you might as well just use df1.join(df2) Commented Jul 16 at 17:51

4 Answers 4

25

Seems like you need to use a combination of them.

s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])

Output:

               s1        s2
bar one  0.765385 -0.365508
    two  1.462860  0.751862
baz one  0.304163  0.761663
    two -0.816658 -1.810634
foo one  1.891434  1.450081
    two  0.571294  1.116862
qux one  1.056516 -0.052927
    two -0.574916 -1.197596
Sign up to request clarification or add additional context in comments.

1 Comment

Why not right_on=s2.index.names?
9

Other than using the indexes names as pointed by @ALollz, you can simply use loc, which will match indexes automatically

s1.loc[:, 's2'] = s2   # Or explicitly, s2['s2']

                s1           s2
first   second      
bar     one     -0.111384   -2.341803
        two     -1.226569    1.308240
baz     one      1.880835    0.697946
        two     -0.008979   -0.247896
foo     one      0.103864   -1.039990
        two      0.836931    0.000811
qux     one     -0.859005   -1.199615
        two     -0.321341   -1.098691

A general formula would be

s1.loc[:, s2.columns] = s2

Comments

8

Assign it by combine_first

s1.combine_first(s2)
Out[19]: 
                    s1        s2
first second                    
bar   one     0.039203  0.795963
      two     0.454782 -0.222806
baz   one     3.101120 -0.645474
      two    -1.174929 -0.875561
foo   one    -0.887226  1.078218
      two     1.507546 -1.078564
qux   one     0.028048  0.042462
      two     0.826544 -0.375351

# s2.combine_first(s1)

Comments

7

rename_axis

You can rename the index levels of one and let join do its thing

s1.join(s2.rename_axis(s1.index.names))

                    s1        s2
first second                    
bar   one    -0.696420 -1.040463
      two     0.640891  1.483262
baz   one     1.598837  0.097424
      two     0.003994 -0.948419
foo   one    -0.717401  1.190019
      two    -1.201237 -0.000738
qux   one     0.559684 -0.505640
      two     1.979700  0.186013

concat

pd.concat([s1, s2], axis=1)

                    s1        s2
first second                    
bar   one    -0.696420 -1.040463
      two     0.640891  1.483262
baz   one     1.598837  0.097424
      two     0.003994 -0.948419
foo   one    -0.717401  1.190019
      two    -1.201237 -0.000738
qux   one     0.559684 -0.505640
      two     1.979700  0.186013

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.