1

I have two dataframes that look similar and I want to divide one column from df1 by a column from df2.

Some sample data is below:

dict1 = {'category': {0: 0.0, 1: 1.0, 2: 0.0, 3: 0.0, 4: 1.0},
 'Id': {0: 24108, 1: 24307, 2: 24307, 3: 24411, 4: 24411},
 'count': {0: 3, 1: 2, 2: 33, 3: 98, 4: 33}}

df1 = pd.DataFrame(dict1)

dict2 = {'Id': {0: 24108, 1: 24307, 2: 24411},
 'count': {0: 3, 1: 35, 2: 131}}

df2 = pd.DataFrame(dict2)

I am trying to create a new column in the first dataframe (df1) called weights by dividing df1['count'] by df2['count']. Except for the column category and count in both dfs, the values are identical in the other columns.

I have the following piece of code, but I cannot seem to understand where the error is:

df1['weights'] = (df1['count']
                      .div(df1.merge(df2, on = 'Id', how = 'left')
                           ['count'].to_numpy())
                      )

I get the following error when I run the code:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

/opt/conda/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/opt/conda/lib/python3.8/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'count'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_354/1318629977.py in <module>
      1 complete['weights'] = (complete['count']
----> 2                       .div(complete.merge(totals, on = 'companyId', how = 'left')['count'].to_numpy())
      3                       )

Any ideas why this is happening?

1
  • 1
    There's count_x and count_y after your merge not count. Commented Jul 27, 2022 at 9:54

1 Answer 1

2

Since you end up with count_x and count_y after your merge, you need to specify which one you want:

df1['weights'] = (df1['count'].div(df1.merge(df2, on = 'Id', how = 'left')['count_y'].to_numpy()))
Sign up to request clarification or add additional context in comments.

2 Comments

The best way would be to merge without creating these extra columns since they are identical.
@NelsonGon count_x and count_y are not identical in terms of values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.