Create a pandas dataframe column of variable sized lists

Question

I need to create a dataframe that contains all possible start times for a scheduler for some machines. My initial dataframe (msDF) contains three simple columns:

MachID - the ID of each machine
Start - the starting datetime that the machine is available for scheduling
slots - the number of slots available starting from that time

msDF is copied from a master dataframe, but for illustration, it may look like this:

msDF = pd.DataFrame({ 'MachID': [1,2,3,4,5],
                      'Start': ["02/04/2021 9:00","06/04/2021 12:30","09/04/2021 10:00", \
                                "12/04/2021 11:00","15/04/2021 08:00"],
                      'slots': [2, 3, 4, 3, 1]})

	MachID	Start	slots
0	1	02/04/2021 9:00	2
1	2	06/04/2021 12:30	3
2	3	09/04/2021 10:00	4
3	1	12/04/2021 11:00	3
4	1	15/04/2021 08:00	1

I need to explode this dataframe so that each row is duplicated "slots" times with a slotIndex. The desired output is:

	MachID	Start	slots	SlotIndex
0	1	02/04/2021 9:00	2	0
0	1	02/04/2021 9:00	2	1
1	2	06/04/2021 12:30	3	0
1	2	06/04/2021 12:30	3	1
1	2	06/04/2021 12:30	3	2

My approach is problematic. I am creating variable length lists into the SlotIndex and exploding them, but this creates warnings.

To do this, I use:

msDF['SlotIndex'] = None
for x in msDF.index:
    msDF.SlotIndex.loc[x] = list(range(msDF.loc[x,'slots']))

It works but with warnings : SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

I later explode msDF to get the result I want:

msDF = msDF.explode('SlotIndex')

How can this be improved?

soumith · Accepted Answer · 2021-03-14 07:50:33Z

1

Use repeat.

df.loc[df.index.repeat(df.slots)]

The index will be repeated. So you can use that to set the slot id.

df['slot_id'] = 1
df['slot_id'] = df.groupby(df.index)['slot_id'].transform('cumsum')

edited Mar 14, 2021 at 7:50

answered Mar 14, 2021 at 7:16

soumith

6161 gold badge4 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

mark fitzpatrick Over a year ago

That's perfect. Thank you @soumith. It was really fast too.

soumith Over a year ago

I have updated the answer. This is one way to go about it.

mark fitzpatrick Over a year ago

That revision gives exactly the result I was aiming for and I never would have come up with it myself. I took your first answer based on repeat and then create a new index that I modulo'd against slots to create the slot_id, but the groupby method is cleaner and deliver the outcome I wanted. Thanks again - I have been stuck on this for a while. My original solution worked in testing and failed in operation for reasons I cannot understand. Yours is working through-and-through.

soumith Over a year ago

Glad it helped!

Collectives™ on Stack Overflow

Create a pandas dataframe column of variable sized lists

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related