2

My dataframe :

df
Object      quantity
A             3
B             4
C             10
D             11

My desired result:

df
Object      quantity
A             3
B             4
C             4
C             4
C             2
D             4
D             4
D             3

My Goal here is to split the value stored in the column2 "quantity" so that it should be 4 or less than 4.

Which method I can use to solve this problem? any suggestion would be appreciated.

0

3 Answers 3

1

Something like this could work. For each group where quantity is greater than 4, apply a function that does the row splitting and stores into a temp dataframe, then combine everything together to get your desired output:

df = pd.DataFrame({'idx': ['A', 'B', 'C', 'D'],
                   'quantity': [3, 4, 10, 11]})

def split_quant(df):
    quantities = ([4]*(df['quantity'].iat[0] // 4)) + [df['quantity'].iat[0] % 4]

    temp = pd.DataFrame({'idx': df['idx'].iat[0],
                         'quantity': quantities
                         }, index=range(len(quantities)))
    temp = temp[temp['quantity']!=0]

    return temp

df_split = df[df['quantity'] > 4].groupby('idx').apply(split_quant)

output = df[df['quantity'] <= 4].append(df_split).reset_index(drop=True)

writer = pd.ExcelWriter('output.xlsx')
output.to_excel(writer, 'Sheet1', index=False)
writer.save()

The above will give you the following output dataframe:

  idx  quantity
0   A         3
1   B         4
2   C         4
3   C         4
4   C         2
5   D         4
6   D         4
7   D         3

EDIT:

I took the liberty of running some timing tests of the various methods. Using Pandas' groupby and apply saves a lot of time and avoids the nested loops over the input data (although I'm sure theres an even faster way that can avoid the apply as well...)

Mine:

5.49 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@Iqbal Basyar:

22.8 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

@sobek

17.7 ms ± 922 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Sign up to request clarification or add additional context in comments.

4 Comments

Hi, Thanks. Your method worked well. Only problem I have is when the number is divisible by 4. ex. 12,16,20etc. I'm getting an extra item with value "0". Is there any possible way to eliminate that while saving it in excel?
within the function you can eliminate 0 with something like temp = temp[temp['quantity']!=0]. That will just reassign temp so that it doesnt contain any rows that are 0
saving to excel is pretty straightforward with pandas to_excel method. Docs/example here: pandas.pydata.org/pandas-docs/stable/generated/…
just updated the answer to handle 0 and to save excel output
1

Unfortunately, Pandas did not support this feature. So you have to create a new dataframe based on your old dataframe.

For each item in old dataframe, calculate

old_quantity = n * 4 + rest_quantitity

So in the new dataframe, you will add n item(s) with quantity of 4 , plus one with quantity of rest_quantity (if rest_quantity is not zero)

df = df = pd.DataFrame({'item': ["A","B","C"], 'qty': [3, 8,11]})
new_df = pd.DataFrame({'Item': [], 'qty': []})

for idx, item in df.iterrows():    
  if item['qty'] > 4 :
      n = item['qty'] // 4
      r = item['qty'] % 4 
      for _ in range(n):
          new_df.loc[len(new_df)] = [item['item'], 4]
      if r > 0 :
          new_df.loc[len(new_df)] = [item['item'], r]
  else :
      new_df.loc[len(new_df)] = [item['item'], item['qty']]

df

    item qty
0   A   3
1   B   8
2   C   11

new_df

   Item qty
0   A   3.0
1   B   4.0
2   B   4.0
3   C   4.0
4   C   4.0
5   C   3.0

Comments

1

This works but as far as pandas is concerned, it's not pretty nor is it fast:

df = pd.DataFrame({'idx': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
                   'quantity': [1., 2., 3., 4., 5., 6., 7.]})

df['factor'] = df.quantity // 4.
df['modulo'] = df.quantity % 4.

res = pd.DataFrame({'idx': [], 'quantity': []})

for idx, row in df.iterrows():
    for idxx in range(int(row.factor)):
        res = res.append({'idx': row.idx, 'quantity': 4.},
                         ignore_index=True)
    if row.modulo > 0:
        res = res.append({'idx': row.idx, 'quantity': row.modulo},
                         ignore_index=True)

In [24]: df
Out[24]: 
  idx  quantity
0   A       1.0
1   B       2.0
2   C       3.0
3   D       4.0
4   E       5.0
5   F       6.0
6   G       7.0

In [22]: res
Out[22]: 
  idx  quantity
0   A       1.0
1   B       2.0
2   C       3.0
3   D       4.0
4   E       4.0
5   E       1.0
6   F       4.0
7   F       2.0
8   G       4.0
9   G       3.0

6 Comments

Hi, I need to split the values of E,F&G and it should be 4 or below 4. that means i have to repeat the items. in this case E=4 and E=1 so that i have two E and both the values are 4 and below 4
@user10309160 Hmm ok, i really did NOT get that from your initial question, so you want to split each number > 4 into summands of size less than 4.
Performance is not an issue. And Thanks, all the suggestions mentioned here works. But I have a small problem.I am working this in an excel i used writer = pd.ExcelWriter("Rule.xlsx",engine="openpyxl") excel=openpyxl.load_workbook("Rule.xlsx") writer.book=excel writer.save() writer.close() but its not overwriting in my excel sheet. Can you tell me why?
@user10309160 Is there any reason you can't do df.to_excel(writer, 'sheet_name'); writer.save() instead of the second line?
I figured it out. Thanks for the help.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.