3

I have a dataframe with offer data that I want to transform into portion data. The column quantity tells how many portions each offer has. If an offer has three portions out of which two were sold, that row should appear three times in the new dataframe, so that the sold column has value 1 on two rows and value 0 on one row.

The actual dataframe contains more columns that should stay intact in the transformed dataframe. Date column works as an example.

My input:

    Offer name     Quantity   Sold   Date
  0 A              3          2      2022-05 
  1 B              2          1      2022-01 
  2 C              1          1      2022-04 
  3 D              1          1      2022-04 
  4 E              1          1      2022-05

The desired output:

    Offer name     Quantity   Sold   Date
  0 A              1          1      2022-05 
  1 A              1          1      2022-05 
  2 A              1          0      2022-05 
  3 B              1          1      2022-01 
  4 B              1          0      2022-01
  and so on...

Thank you!!

2
  • What if Sold is greater than Quantity? Commented May 30, 2022 at 17:32
  • It's impossible in this dataframe. Commented May 31, 2022 at 7:07

2 Answers 2

2

Solution

s = df.reindex(df.index.repeat(df['Quantity']))
s['Quantity'] = 1
s['Sold'] = s.groupby(level=0).cumcount().lt(s['Sold']).astype(int)

Explained

Repeat the index of dataframe exactly Quantity times

>>> s = df.reindex(df.index.repeat(df['Quantity']))
>>> s

  Offer name  Quantity  Sold     Date
0          A         3     2  2022-05
0          A         3     2  2022-05
0          A         3     2  2022-05
1          B         2     1  2022-01
1          B         2     1  2022-01
2          C         1     1  2022-04
3          D         1     1  2022-04
4          E         1     1  2022-05

Group the above dataframe on index and use cumcount to create a sequential counter,

>>> s.groupby(level=0).cumcount()
0    0
0    1
0    2
1    0
1    1
2    0
3    0
4    0
dtype: int64

Flag the rows where the sequential counter is less than Sold quantity then change the dtype to int and assign the result to Sold column

>>> s['Sold'] = s.groupby(level=0).cumcount().lt(s['Sold']).astype(int)
>>> s

  Offer name  Quantity  Sold     Date
0          A         1     1  2022-05
0          A         1     1  2022-05
0          A         1     0  2022-05
1          B         1     1  2022-01
1          B         1     0  2022-01
2          C         1     1  2022-04
3          D         1     1  2022-04
4          E         1     1  2022-05
Sign up to request clarification or add additional context in comments.

Comments

1

You can try apply on rows

def repeat(row):
    offer = [row['Offer name']] * row['Quantity']
    sold = [1] * row['Sold'] + [0] * (row['Quantity']-row['Sold'])
    return offer, sold, 1

df[['Offer name', 'Sold', 'Quantity']] = df.apply(repeat, axis=1, result_type='expand')
df = df.explode(['Offer name', 'Sold'])
print(df)

  Offer name  Quantity Sold     Date
0          A         1    1  2022-05
0          A         1    1  2022-05
0          A         1    0  2022-05
1          B         1    1  2022-01
1          B         1    0  2022-01
2          C         1    1  2022-04
3          D         1    1  2022-04
4          E         1    1  2022-05

1 Comment

Thanks! This was also a clear way to solve the problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.