I have two Excel sheets (Master & Input) with the same index column but a different number of columns (see below). I want to merge the Input DF into the Master DF if new rows have been added (see ID 103-105) OR an item in the Input DF has been updated (see ID 102). Other columns can be ignored.
Dataframe 1 (Master):
Dataframe 2 (Input):
Goal (updated cells marked in yellow):
I am using the following script:
inputDf = pd.read_excel(inputFileName).set_index("ID")
masterDf = pd.read_excel(masterFileName).set_index("ID")
# Update existing rows
masterDf.update(inputDf)
# find out which ids are new
ids_of_new_rows = set(inputDf.index) - set(masterDf.index)
# get new rows that should be added to master
rows_to_add = masterDf.loc[ids_of_new_rows, inputDf.columns & masterDf.columns]
I am able to update the Master DF and get ids_of_new_rows. Output:
{'CR103', 'CR104', 'CR105'}
However, when trying to get rows_to_add, I always receive the following error:
KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['CR103', 'CR104', 'CR105'], dtype='object', name='ID')] are in the [index]"
Any ideas?


