0

I have a dataframe like this:

name      food
mike     pizza  
mike    cookie  
mike    banana  
mary     apple  
mary      pear  
jane  broccoli

I want to add a sequential integer column that is unique to name, like this:

id  name      food
 1  mike     pizza  
 1  mike    cookie  
 1  mike    banana  
 2  mary     apple  
 2  mary      pear  
 3  jane  broccoli

Is there an elegant pandas one- (or two-) liner to do that? I'm new to pandas and suspect there's a way to do it using some combination of groupby and lambda, but I'm not making any progress.

3
  • 1
    df["name"].astype("category").cat.codes Commented Sep 3, 2019 at 20:27
  • Unique to each name, or unique to each consecutive grouping of names (which the posted answers seem to accomplish). With a sorted DataFrame these may be the same, but in general are not. Commented Sep 3, 2019 at 20:33
  • 1
    df.groupby('name', sort=False).ngroup()+1 is likely what you want. It's unique per name, and the counter is based on the occurrence in the DataFrame, not any lexicographical sorting. Commented Sep 3, 2019 at 20:36

3 Answers 3

1

You can use pd.factorize:

df['Id'] = pd.factorize(df['name'])[0] + 1

Output:

   name      food  Id
0  mike     pizza   1
1  mike    cookie   1
2  mike    banana   1
3  mary     apple   2
4  mary      pear   2
5  jane  broccoli   3

Then set_index Id:

df.set_index('Id')

Output:

    name      food
Id                
1   mike     pizza
1   mike    cookie
1   mike    banana
2   mary     apple
2   mary      pear
3   jane  broccoli
Sign up to request clarification or add additional context in comments.

1 Comment

All of the other answers are great, and frankly I didn't realize how little I knew - everyone's solutions result in proper dataframe indexes -- which I didn't know were a thing; I just wanted the column of values, which this and fuglede's solutions accomplish in one line. This one appears to be sort-insensitive, which is a bonus.
1

You could let

df['id'] = (df.name != df.name.shift(1)).cumsum()

3 Comments

what if the name values arnt sorted?
The provided example would suggest that they are. @Superduper?
Yes, in my case, they are sorted, so this answer works, but for general usage the factorize seems better. And just a note for future visitors: both this solution and the accepted solution work, but in my case resulted in a SettingWithCopyWarning (python 3.7 kernel in jupteryLab 0.35.3)
0

try this:

df.set_index((~df.name.duplicated()).cumsum())
      name      food
name                
1     mike   pizza  
1     mike  cookie  
1     mike  banana  
2     mary   apple  
2     mary    pear  
3     jane  broccoli

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.