4

This is a simple thing but I don't think it's been covered on SO or in the Pandas documentation.

The data

Two DataFrames containing data about cities. The DataFrame UK contains a variable for UK city names. It has been grouped so there are no duplicate city names. This is a truncated example of this city variable, (not the full DataFrame):

city
Hamilton
Edinburgh
Bury
...

The DataFrame US contains a corresponding variable for US city names. It does have duplicate city names, but these are not actually duplicate values as there are numerous cities with the same name. Again, a truncated example:

city
Hamilton
Hamilton
Edinburgh
Edinburgh
Edinburgh
Bury
Bury 
...

(Note: The variables aren't the same length in my data.)

The aim

Create a new variable in the UK DataFrame which is an integer count for each time a city name in the US city variable matches a city name in the UK city variable.

Below shows an example of the output I'm aiming for. Say for one UK city called Hamilton there are 2 cities called Hamilton in the US. For one UK city called Edinburgh there are 3 matches in the US. And 2 matches for Bury.

city       count
Hamilton       2
Edinburgh      3
Bury           2

Where I'm stuck

If I do this...

uk['count'] = uk['city'].isin(us['city']).astype(int)

the new variable is a binary 1 or 0 indicating that there is a match, which is halfway there. However I'm struggling with the Pandas syntax to return the count of matches. I've tried appending value_counts and variations of unique but these didn't work. Maybe I'm approaching this the wrong way.

1 Answer 1

11

Use Series.value_counts to count the number of occurrences for each city in US['city'], and then use Series.map to apply those counts to corresponding values in UK['city']:

In [40]: US['city'].value_counts()
Out[40]: 
Edinburgh    3
Bury         2
Hamilton     2
Name: city, dtype: int64

In [41]: UK['count'] = UK['city'].map(US['city'].value_counts())

In [42]: UK
Out[42]: 
        city  count
0   Hamilton      2
1  Edinburgh      3
2       Bury      2

If there is a city in UK['city'] which is not in US['city'] then the above will assign a NaN value to the count column for that city.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.