How do you replace duplicate values with multiple unique strings in Pandas?

Question

import pandas as pd
import numpy as np
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry'], 'Age':[20, 21, 19, 18]} 
df = pd.DataFrame(data)

Lets say I have a dataframe that looks like this. I am trying to figure out how to check the Name column for the value 'Tom' and if I find it the first time I replace it with the value 'FirstTom' and the second time it appears I replace it with the value 'SecondTom'. How do you accomplish this? I've used the replace method before but only for replacing all Toms with a single value. I don't want to add a 1 on the end of the value, but completely change the string to something else.

Edit:

If the df looked more like this below, how would we check for Tom in the first column and the second column and then replace the first instance with FirstTom and the second instance with SecondTom

data = {'Name':['Tom', 'Jerry', 'Jack', 'Terry'], 'OtherName':[Tom, John, Bob,Steve]}

anky · Accepted Answer · 2020-01-28 15:10:15Z

12

Just adding in to the existing solutions , you can use inflect to create dynamic dictionary

import inflect
p = inflect.engine()

df['Name'] += df.groupby('Name').cumcount().add(1).map(p.ordinal).radd('_')
print(df)

        Name  Age
0    Tom_1st   20
1    Tom_2nd   21
2   Jack_1st   19
3  Terry_1st   18

answered Jan 28, 2020 at 15:10

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Shubham Sharma Over a year ago

nice answer, while searching on google just found out this :)

BENY · Accepted Answer · 2020-01-28 15:29:53Z

10

We can do cumcount

df.Name=df.Name+df.groupby('Name').cumcount().astype(str)
df
     Name  Age
0    Tom0   20
1    Tom1   21
2   Jack0   19
3  Terry0   18

Update

suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n if n<20 else n%10,"th"))
g=df.groupby('Name')


df.Name=df.Name.radd(g.cumcount().add(1).map(suf).mask(g.Name.transform('count')==1,''))
df
     Name  Age
0  1stTom   20
1  2ndTom   21
2    Jack   19
3   Terry   18

Update 2 for column

suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n if n<20 else n%10,"th"))

g=s.groupby([s.index.get_level_values(0),s])
s=s.radd(g.cumcount().add(1).map(suf).mask(g.transform('count')==1,''))
s=s.unstack()
     Name OtherName
0  1stTom    2ndTom
1   Jerry      John
2    Jack       Bob
3   Terry     Steve

edited Jan 28, 2020 at 15:29

answered Jan 28, 2020 at 14:56

BENY

324k22 gold badges176 silver badges250 bronze badges

7 Comments

jezrael Over a year ago

OP need I don't want to add a 1 on the end of the value

Logan0015 Over a year ago

This is great thank you. Now what if there is a second column of names and instead of checking the values vertically it checks for the same name horizontally?

BENY Over a year ago

@Logan0015L you can do df.groupby(['Name1','Name2']).cumcount()

BENY Over a year ago

@jezrael In my understanding, if we can not build the string 1st to....nth , I think it is better keep the number in the name

Logan0015 Over a year ago

Could this be grouped by the row instead of the column?

|

jezrael · Accepted Answer · 2020-01-28 16:27:24Z

EDIT: For count duplicated per rows use:

df = pd.DataFrame(data = {'Name':['Tom', 'Jerry', 'Jack', 'Terry'], 
                          'OtherName':['Tom', 'John', 'Bob','Steve'],
                          'Age':[20, 21, 19, 18]})

print (df)
    Name OtherName  Age
0    Tom       Tom   20
1  Jerry      John   21
2   Jack       Bob   19
3  Terry     Steve   18

import inflect
p = inflect.engine()

#map by function for dynamic counter
f = lambda i: p.number_to_words(p.ordinal(i))
#columns filled by names
cols = ['Name','OtherName']
#reshaped to MultiIndex Series
s = df[cols].stack()
#counter per groups
count = s.groupby([s.index.get_level_values(0),s]).cumcount().add(1)
#mask for filter duplicates
mask = s.reset_index().duplicated(['level_0',0], keep=False).values
#filter only duplicates and map, reshape back and add to original data
df[cols] = count[mask].map(f).unstack().add(df[cols], fill_value='')
print (df)
       Name  OtherName  Age
0  firstTom  secondTom   20
1     Jerry       John   21
2      Jack        Bob   19
3     Terry      Steve   18

Use GroupBy.cumcount with Series.map, but only for duplicated values by Series.duplicated:

data = {'Name':['Tom', 'Tom', 'Jack', 'Terry'], 'Age':[20, 21, 19, 18]} 
df = pd.DataFrame(data)

nth = {
0: "First",
1: "Second",
2: "Third",
3: "Fourth"
}

mask = df.Name.duplicated(keep=False)
df.loc[mask, 'Name'] = df[mask].groupby('Name').cumcount().map(nth) + df.loc[mask, 'Name']
print (df)
        Name  Age
0   FirstTom   20
1  SecondTom   21
2       Jack   19
3      Terry   18

Dynamic dictionary should be like:

import inflect
p = inflect.engine()

mask = df.Name.duplicated(keep=False)
f = lambda i: p.number_to_words(p.ordinal(i))
df.loc[mask, 'Name'] = df[mask].groupby('Name').cumcount().add(1).map(f) + df.loc[mask, 'Name']
print (df)

        Name  Age
0   firstTom   20
1  secondTom   21
2       Jack   19
3      Terry   18

this is very slick use of map and cumcount, nice one. maybe add a step to show the number of possible cumulative counts and build up a dictionary dynamically?

piRSquared · Accepted Answer · 2020-01-28 15:18:31Z

6

`transform`

nth = ['First', 'Second', 'Third', 'Fourth']

def prefix(d):
    n = len(d)
    if n > 1:
        return d.radd([nth[i] for i in range(n)])
    else:
        return d

df.assign(Name=df.groupby('Name').Name.transform(prefix))

          Name  Age
0     FirstTom   20
1    SecondTom   21
2         Jack   19
3        Terry   18
4   FirstSteve   17
5  SecondSteve   16
6   ThirdSteve   15

answered Jan 28, 2020 at 15:18

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Collectives™ on Stack Overflow

How do you replace duplicate values with multiple unique strings in Pandas?

4 Answers 4

1 Comment

7 Comments

1 Comment

`transform`

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

7 Comments

1 Comment

transform

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

`transform`