2

I want to append a sequence of numbers by group ID.

I have a column that looks like this:

ID [123, 124...]

I need to append a sequence of numbers (1:485) to each unique ID so that the final data frame looks like this:

ID     [123, 123, 123, 123, 124, 124, 124, 124...]
Number [1,    2,   3,   4,   1,   2 ,  3,   4...]

Does anyone have simple guidance?

3
  • Is ID part of pandas dataframe of just a plain list ? Commented Jul 23, 2021 at 13:30
  • Part of the data frame Commented Jul 23, 2021 at 13:32
  • 1
    The problem is that I'm not counting the groups, I'm assigning the sequence to them. The sequence is from 1 to 485, and I want to copy each row (read: ID) and assign it a number in that sequence. Therefore, there should be 485 copies of each ID row. Cumcount doesn't address it Commented Jul 23, 2021 at 13:47

2 Answers 2

2

You can repeat the index 485 times and loc with it to expand. cumcount then gives the 1...485 counts per ID:

new_df = df.loc[df.index.repeat(485)].reset_index(drop=True)
new_df["Number"] = new_df.groupby("ID").cumcount().add(1)

to get

>>> new_df

      ID  Number
0    123       1
1    123       2
2    123       3
3    123       4
4    123       5
..   ...     ...
965  124     481
966  124     482
967  124     483
968  124     484
969  124     485

[970 rows x 2 columns]
Sign up to request clarification or add additional context in comments.

1 Comment

I always like the repeat solution +1. I tend to err towards reindex incase there are multiple of the same ID. But this is definitely more performant when it can be used. :)
1

We can create a MultiIndex.from_product with the unique IDs and the new range of values (created with np.arange).

Then enumerate the IDs already present in the DataFrame with Groupby.cumcount.

Then set_index, reindex and reset_index:

import numpy as np
import pandas as pd

df = pd.DataFrame({'ID': [123, 124], 'x1': ['a', 'b']})

u_id = df['ID'].unique()
midx = pd.MultiIndex.from_product([u_id, np.arange(1, 486)],
                                  names=['ID', 'Number'])
df = (
    df.assign(Number=df.groupby('ID').cumcount() + 1)
        .set_index(['ID', 'Number'])
        .reindex(midx, method='ffill')
        .reset_index()
)
print(df)

df:

      ID  Number x1
0    123       1  a
1    123       2  a
2    123       3  a
3    123       4  a
4    123       5  a
..   ...     ... ..
965  124     481  b
966  124     482  b
967  124     483  b
968  124     484  b
969  124     485  b

[970 rows x 3 columns]

2 Comments

This worked but erased my other variables (x1,x2,x3,x4). What is the solution for retaining those as well? Thanks a lot for pushing this in the right direction
Yes. We can use a combination of groupby cumcount and reindex. Try the edit.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.