11

I am merging two dataframes using merge(..., how='left') since I want to retain only entries that match up with the "left" dataframe. The problem is that the merge operation seems to drop the index of my leftmost dataframe, shown here:

import pandas
df1 = pandas.DataFrame([{"id": 1,
                         "name": "bob"},
                        {"id": 10,
                         "name": "sally"}])
df1 = df1.set_index("id")
df2 = pandas.DataFrame([{"name": "bob",
                         "age": 10},
                        {"name": "sally",
                         "age": 11}])

print "df1 premerge: "
print df1
df1 = df1.merge(df2, on=["name"],
                how="left")
print "merged: "
print df1
# This is not "id"
print df1.index
# And there's no "id" field
assert ("id" in df1.columns) == False

Before the merge, df1 was indexed by id. After the merge operation, there's just the default numeric index for the merged dataframe and the id field was dropped. How can I do this kind of merge operation but retain the index of the leftmost dataframe?

To clarify: I want all the columns of df2 to be added to every entry in df1 that has the matching id value. If an entry in df2 has an id value not in df1, then that shouldn't be merged in (hence the how='left').

edit: I could as a hack do: df1.reset_index() but merging and then set the index again, but I prefer not to if possible, it seems like merge shouldn't have to drop the index. thanks.

8
  • 1
    You could try indexing df1 and df2 by name instead of id, and then use join instead of merge like this: df1.join(df2) which will preserve the index. Commented Mar 27, 2013 at 14:45
  • It's not possible since name is not unique in df1 Commented Mar 27, 2013 at 14:50
  • Why don't you set_index after the merge? Commented Mar 27, 2013 at 15:00
  • 4
    I would but it drops the column that I want to be the index after the merge, namely id Commented Mar 27, 2013 at 15:02
  • 2
    I found this question, which appears to be the same thing. The answer there is basically the hack you talked about. Commented Mar 27, 2013 at 15:05

1 Answer 1

5

You've already pointed out doing a reset_index before the merge and a set_index afterwards, which works. The only way I know of to preserve indices across a merge is for the merge to involve an index on at least one of the data frames being merged. So here, you could do:

In [403]: df2 = df2.set_index('name')

In [404]: df1.merge(df2, left_on='name', right_index=True)
Out[404]: 
     name  age
id            
1     bob   10
10  sally   11

to merge df2's index, which we've taken from its 'name' column, against the 'name' column on df1.

This makes some sense, because otherwise the index of the resulting dataframe is ambiguous as it could come from either dataframe.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.