1

I need to create a dataframe that contains all possible start times for a scheduler for some machines. My initial dataframe (msDF) contains three simple columns:

  • MachID - the ID of each machine
  • Start - the starting datetime that the machine is available for scheduling
  • slots - the number of slots available starting from that time

msDF is copied from a master dataframe, but for illustration, it may look like this:

msDF = pd.DataFrame({ 'MachID': [1,2,3,4,5],
                      'Start': ["02/04/2021 9:00","06/04/2021 12:30","09/04/2021 10:00", \
                                "12/04/2021 11:00","15/04/2021 08:00"],
                      'slots': [2, 3, 4, 3, 1]})
MachID Start slots
0 1 02/04/2021 9:00 2
1 2 06/04/2021 12:30 3
2 3 09/04/2021 10:00 4
3 1 12/04/2021 11:00 3
4 1 15/04/2021 08:00 1

I need to explode this dataframe so that each row is duplicated "slots" times with a slotIndex. The desired output is:

MachID Start slots SlotIndex
0 1 02/04/2021 9:00 2 0
0 1 02/04/2021 9:00 2 1
1 2 06/04/2021 12:30 3 0
1 2 06/04/2021 12:30 3 1
1 2 06/04/2021 12:30 3 2

My approach is problematic. I am creating variable length lists into the SlotIndex and exploding them, but this creates warnings.

To do this, I use:

msDF['SlotIndex'] = None
for x in msDF.index:
    msDF.SlotIndex.loc[x] = list(range(msDF.loc[x,'slots']))

It works but with warnings : SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

I later explode msDF to get the result I want:

msDF = msDF.explode('SlotIndex')

How can this be improved?

1 Answer 1

1

Use repeat.

df.loc[df.index.repeat(df.slots)]

The index will be repeated. So you can use that to set the slot id.

df['slot_id'] = 1
df['slot_id'] = df.groupby(df.index)['slot_id'].transform('cumsum')
Sign up to request clarification or add additional context in comments.

4 Comments

That's perfect. Thank you @soumith. It was really fast too.
I have updated the answer. This is one way to go about it.
That revision gives exactly the result I was aiming for and I never would have come up with it myself. I took your first answer based on repeat and then create a new index that I modulo'd against slots to create the slot_id, but the groupby method is cleaner and deliver the outcome I wanted. Thanks again - I have been stuck on this for a while. My original solution worked in testing and failed in operation for reasons I cannot understand. Yours is working through-and-through.
Glad it helped!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.