Python create sequence of numbers and append by group

Question

I want to append a sequence of numbers by group ID.

I have a column that looks like this:

ID [123, 124...]

I need to append a sequence of numbers (1:485) to each unique ID so that the final data frame looks like this:

ID     [123, 123, 123, 123, 124, 124, 124, 124...]
Number [1,    2,   3,   4,   1,   2 ,  3,   4...]

Does anyone have simple guidance?

The problem is that I'm not counting the groups, I'm assigning the sequence to them. The sequence is from 1 to 485, and I want to copy each row (read: ID) and assign it a number in that sequence. Therefore, there should be 485 copies of each ID row. Cumcount doesn't address it — nj95
– nj95, Commented Jul 23, 2021 at 13:47

Mustafa Aydın · Accepted Answer · 2021-07-23 13:53:54Z

2

You can repeat the index 485 times and loc with it to expand. cumcount then gives the 1...485 counts per ID:

new_df = df.loc[df.index.repeat(485)].reset_index(drop=True)
new_df["Number"] = new_df.groupby("ID").cumcount().add(1)

to get

>>> new_df

      ID  Number
0    123       1
1    123       2
2    123       3
3    123       4
4    123       5
..   ...     ...
965  124     481
966  124     482
967  124     483
968  124     484
969  124     485

[970 rows x 2 columns]

answered Jul 23, 2021 at 13:53

Mustafa Aydın

18.4k4 gold badges21 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Henry Ecker Over a year ago

I always like the repeat solution +1. I tend to err towards reindex incase there are multiple of the same ID. But this is definitely more performant when it can be used. :)

Henry Ecker · Accepted Answer · 2021-07-23 14:06:33Z

1

We can create a MultiIndex.from_product with the unique IDs and the new range of values (created with np.arange).

Then enumerate the IDs already present in the DataFrame with Groupby.cumcount.

Then set_index, reindex and reset_index:

import numpy as np
import pandas as pd

df = pd.DataFrame({'ID': [123, 124], 'x1': ['a', 'b']})

u_id = df['ID'].unique()
midx = pd.MultiIndex.from_product([u_id, np.arange(1, 486)],
                                  names=['ID', 'Number'])
df = (
    df.assign(Number=df.groupby('ID').cumcount() + 1)
        .set_index(['ID', 'Number'])
        .reindex(midx, method='ffill')
        .reset_index()
)
print(df)

df:

      ID  Number x1
0    123       1  a
1    123       2  a
2    123       3  a
3    123       4  a
4    123       5  a
..   ...     ... ..
965  124     481  b
966  124     482  b
967  124     483  b
968  124     484  b
969  124     485  b

[970 rows x 3 columns]

edited Jul 23, 2021 at 14:06

answered Jul 23, 2021 at 13:50

Henry Ecker♦

35.9k19 gold badges48 silver badges67 bronze badges

2 Comments

nj95 Over a year ago

This worked but erased my other variables (x1,x2,x3,x4). What is the solution for retaining those as well? Thanks a lot for pushing this in the right direction

Henry Ecker Over a year ago

Yes. We can use a combination of groupby cumcount and reindex. Try the edit.

Collectives™ on Stack Overflow

Python create sequence of numbers and append by group

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related