I am merging two dataframes using merge(..., how='left') since I want to retain only entries that match up with the "left" dataframe. The problem is that the merge operation seems to drop the index of my leftmost dataframe, shown here:
import pandas
df1 = pandas.DataFrame([{"id": 1,
"name": "bob"},
{"id": 10,
"name": "sally"}])
df1 = df1.set_index("id")
df2 = pandas.DataFrame([{"name": "bob",
"age": 10},
{"name": "sally",
"age": 11}])
print "df1 premerge: "
print df1
df1 = df1.merge(df2, on=["name"],
how="left")
print "merged: "
print df1
# This is not "id"
print df1.index
# And there's no "id" field
assert ("id" in df1.columns) == False
Before the merge, df1 was indexed by id. After the merge operation, there's just the default numeric index for the merged dataframe and the id field was dropped. How can I do this kind of merge operation but retain the index of the leftmost dataframe?
To clarify: I want all the columns of df2 to be added to every entry in df1 that has the matching id value. If an entry in df2 has an id value not in df1, then that shouldn't be merged in (hence the how='left').
edit: I could as a hack do: df1.reset_index() but merging and then set the index again, but I prefer not to if possible, it seems like merge shouldn't have to drop the index. thanks.
df1anddf2by name instead of id, and then usejoininstead ofmergelike this:df1.join(df2)which will preserve the index.nameis not unique indf1id