2

I have a dataframe df

Name            Reagent
0   Experiment1 water
1   Experiment1 oil
2   Experiment1 water
3   Experiment1 milk
4   Experiment1 water
5   Experiment1 tea
6   Experiment1 water
7   Experiment1 coffee
8   Experiment2 water
9   Experiment2 coffee

I want to replace duplicate names within the same experiment with a differentiator of some sort. In the example only water is duplicated within a given experiment.

e.g

   Name         Reagent
0   Experiment1 water1
1   Experiment1 oil
2   Experiment1 water2
3   Experiment1 milk
4   Experiment1 water3
5   Experiment1 tea
6   Experiment1 water4
7   Experiment1 coffee
8   Experiment2 water
9   Experiment2 coffee

Thanks for any help

1 Answer 1

3

Solution: append all values with the GroupBy.cumcount as a counter (and replace 0 values with empty strings to ignore each first dupe):

df['Reagent'] += df.groupby(['Name','Reagent']).cumcount().astype(str).replace('0','')
print (df)
          Name Reagent
0  Experiment1   water
1  Experiment1     oil
2  Experiment1  water1
3  Experiment1    milk
4  Experiment1  water2
5  Experiment1     tea
6  Experiment1  water3
7  Experiment1  coffee
8  Experiment2   water
9  Experiment2  coffee

If need replace only all dupes by both columns filter rows by DataFrame.duplicated by both columns and add 1:

mask = df.duplicated(['Name','Reagent'], keep=False)
df.loc[mask, 'Reagent'] += df[mask].groupby(['Name','Reagent']).cumcount().add(1).astype(str)
print (df)
          Name Reagent
0  Experiment1  water1
1  Experiment1     oil
2  Experiment1  water2
3  Experiment1    milk
4  Experiment1  water3
5  Experiment1     tea
6  Experiment1  water4
7  Experiment1  coffee
8  Experiment2   water
9  Experiment2  coffee
Sign up to request clarification or add additional context in comments.

4 Comments

Oh wow, that was quick. Please could you give a brief description of what the line is doing. How would I put a hypen in between the number?
@ukemi - Thank you.
@user11305439 - Sorry, dont see full comment. Use df['Reagent'] += '-' + df.groupby(['Name','Reagent']).cumcount().astype(str).replace('0','')
@user11305439 - Or df.loc[mask, 'Reagent'] += '-' + df[mask].groupby(['Name','Reagent']).cumcount().add(1).astype(str)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.