Add repeating index to a pandas dataframe

Question

I have a dataframe like this:

name      food
mike     pizza  
mike    cookie  
mike    banana  
mary     apple  
mary      pear  
jane  broccoli

I want to add a sequential integer column that is unique to name, like this:

id  name      food
 1  mike     pizza  
 1  mike    cookie  
 1  mike    banana  
 2  mary     apple  
 2  mary      pear  
 3  jane  broccoli

Is there an elegant pandas one- (or two-) liner to do that? I'm new to pandas and suspect there's a way to do it using some combination of groupby and lambda, but I'm not making any progress.

Unique to each name, or unique to each consecutive grouping of names (which the posted answers seem to accomplish). With a sorted DataFrame these may be the same, but in general are not. — ALollz
– ALollz, Commented Sep 3, 2019 at 20:33
df.groupby('name', sort=False).ngroup()+1 is likely what you want. It's unique per name, and the counter is based on the occurrence in the DataFrame, not any lexicographical sorting. — ALollz
– ALollz, Commented Sep 3, 2019 at 20:36

Scott Boston · Accepted Answer · 2019-09-03 21:08:11Z

1

You can use pd.factorize:

df['Id'] = pd.factorize(df['name'])[0] + 1

Output:

   name      food  Id
0  mike     pizza   1
1  mike    cookie   1
2  mike    banana   1
3  mary     apple   2
4  mary      pear   2
5  jane  broccoli   3

Then set_index Id:

df.set_index('Id')

Output:

    name      food
Id                
1   mike     pizza
1   mike    cookie
1   mike    banana
2   mary     apple
2   mary      pear
3   jane  broccoli

answered Sep 3, 2019 at 21:08

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Superduper Over a year ago

All of the other answers are great, and frankly I didn't realize how little I knew - everyone's solutions result in proper dataframe indexes -- which I didn't know were a thing; I just wanted the column of values, which this and fuglede's solutions accomplish in one line. This one appears to be sort-insensitive, which is a bonus.

fuglede · Accepted Answer · 2019-09-03 20:30:38Z

1

You could let

df['id'] = (df.name != df.name.shift(1)).cumsum()

answered Sep 3, 2019 at 20:30

fuglede

18.3k3 gold badges62 silver badges107 bronze badges

3 Comments

Erfan Over a year ago

what if the name values arnt sorted?

fuglede Over a year ago

The provided example would suggest that they are. @Superduper?

Superduper Over a year ago

Yes, in my case, they are sorted, so this answer works, but for general usage the factorize seems better. And just a note for future visitors: both this solution and the accepted solution work, but in my case resulted in a SettingWithCopyWarning (python 3.7 kernel in jupteryLab 0.35.3)

Billy Bonaros · Accepted Answer · 2019-09-03 20:32:46Z

0

try this:

df.set_index((~df.name.duplicated()).cumsum())
      name      food
name                
1     mike   pizza  
1     mike  cookie  
1     mike  banana  
2     mary   apple  
2     mary    pear  
3     jane  broccoli

answered Sep 3, 2019 at 20:32

Billy Bonaros

1,73114 silver badges19 bronze badges

Collectives™ on Stack Overflow

Add repeating index to a pandas dataframe

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related