4

I am working with a dataframe in Pandas and I need a solution to automatically modify one of the columns that has duplicate values. It is a column type 'object' and I would need to modify the name of the duplicate values. The dataframe is the following:

      City           Year       Restaurants
0   New York         2001       20
1      Paris         2000       40
2   New York         1999       41
3   Los Angeles      2004       35
4     Madrid         2001       22
5   New York         1998       33
6   Barcelona        2001       15

As you can see, New York is repeated 3 times. I would like to create a new dataframe in which this value would be automatically modified and the result would be the following:

      City           Year       Restaurants
0   New York 2001    2001       20
1      Paris         2000       40
2   New York 1999    1999       41
3   Los Angeles      2004       35
4     Madrid         2001       22
5   New York 1998    1998       33
6   Barcelona        2001       15

I would also be happy with "New York 1", "New York 2" and "New York 3". Any option would be good.

2 Answers 2

3

Use np.where, to modify column City if duplicated

df['City']=np.where(df['City'].duplicated(keep=False), df['City']+' '+df['Year'].astype(str),df['City'])
Sign up to request clarification or add additional context in comments.

Comments

2

A different approach without the use of numpy would be with groupby.cumcount() which will give you your alternative New York 1, New York 2 but for all values.

df['City'] = df['City'] + ' ' + df.groupby('City').cumcount().add(1).astype(str)

            City  Year  Restaurants
0     New York 1  2001           20
1        Paris 1  2000           40
2     New York 2  1999           41
3  Los Angeles 1  2004           35
4       Madrid 1  2001           22
5     New York 3  1998           33
6    Barcelona 1  2001           15

To have an increment only in the duplicate cases you can use loc:

df.loc[df[df.City.duplicated(keep=False)].index, 'City'] = df['City'] + ' ' + df.groupby('City').cumcount().add(1).astype(str)

          City  Year  Restaurants
0   New York 1  2001           20
1        Paris  2000           40
2   New York 2  1999           41
3  Los Angeles  2004           35
4       Madrid  2001           22
5   New York 3  1998           33
6    Barcelona  2001           15

1 Comment

Fantastic, thanks a lot!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.