2

I have two data frames which look like and I want to merge them based on the countries

df1:

+-------------------+-------------------+--------------+----------+----------+
| Country/Region    | ObservationDate   |  Confirmed   |  Deaths  | Recovered|
+-------------------+-------------------+--------------+----------+----------+
|  Mainland China   |  2020-01-22       |    547       |   17     |  28      | 
|  Indonesia        |  2020-01-22       |    0         |   0      |  0       |
|  Japan            |  2020-01-22       |    2         |   0      |  0       |     
|  Thailand         |  2020-01-22       |    2         |   0      |  0       |    
|  Mainland China   |  2020-01-23       |    639       |   18     |  30      |  
+-----------------+---------------------+--------------+----------+----------+

df2:

+-----------------+-------------------+--------------------+
| Country         |  Region           |  Tropic/Nontropic  |
+-----------------+-------------------+--------------------+
|  Mainland China |  Asia & Pacific   | nontropic          |
|  Indonesia      |  Asia & Pacific   | tropic             |
|  Japan          |  Asia & Pacific   | nontropic          |
|  Thailand       |  Asia & Pacific   | tropic             | 
+-----------------+-------------------+--------------------+

The output I want may look like this:

df__new:

+-------------------+-------------------+--------------+----------+----------+-------------------+--------------------+
| Country/Region    | ObservationDate   |  Confirmed   |  Deaths  | Recovered|  Region           |  Tropic/Nontropic  |
+-------------------+-------------------+--------------+----------+----------+-------------------+--------------------+
|  Mainland China   |  2020-01-22       |    547       |   17     |  28      |  Asia & Pacific   | nontropic          | 
|  Indonesia        |  2020-01-22       |    0         |   0      |  0       |  Asia & Pacific   | tropic             |
|  Japan            |  2020-01-22       |    2         |   0      |  0       |  Asia & Pacific   | nontropic          |     
|  Thailand         |  2020-01-22       |    2         |   0      |  0       |  Asia & Pacific   | tropic             |    
|  Mainland China   |  2020-01-23       |    639       |   18     |  30      |  Asia & Pacific   | nontropic          |  
+-----------------+---------------------+--------------+----------+----------+-------------------+--------------------+

I have tried:

pd.merge(df_new, df_cat, on=['Country/Region', 'Country'], how='left')

But it raised an error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-86-eff12d512209> in <module>
----> 1 pd.merge(df_new, df_cat, on=['Country/Region', 'Country'], how='left')

~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     84         copy=copy,
     85         indicator=indicator,
---> 86         validate=validate,
     87     )
     88     return op.get_result()

~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    625             self.right_join_keys,
    626             self.join_names,
--> 627         ) = self._get_merge_keys()
    628 
    629         # validate the merge keys dtypes. We may need to coerce

~\anaconda3\lib\site-packages\pandas\core\reshape\merge.py in _get_merge_keys(self)
    981                     if not is_rkey(rk):
    982                         if rk is not None:
--> 983                             right_keys.append(right._get_label_or_level_values(rk))
    984                         else:
    985                             # work-around for merge_asof(right_index=True)

~\anaconda3\lib\site-packages\pandas\core\generic.py in _get_label_or_level_values(self, key, axis)
   1690             values = self.axes[axis].get_level_values(key)._values
   1691         else:
-> 1692             raise KeyError(key)
   1693 
   1694         # Check for duplicates

KeyError: 'Country/Region'

How can I achieve the result on df_new?

1 Answer 1

4

The issue is that the columns you want to compare have different names, therefore you cannot just write on=['Country/Region', 'Country'] but have to specify the column name in each dataframe.

The error message KeyError: 'Country/Region' points that it searches for column that does not exist in one of the tables.

Try -

pd.merge(left=df1, right=df2, left_on='Country/Region', right_on='Country', how='left')

Check documentation here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.