0

My dataframe:

 df:
 order             quantity
  A                   1
  B                   1
  C                   2
  D                   3
  E                   3
  F                   4

My goal is to create a group from this Dataframe based on the Quantity value. My desired result.

 df:
group        order             quantity
  1             A                   1
                B                   1
                C                   2
  2             D                   3
                E                   1
  3             E                   2
                F                   2
  4             F                   2

So here my desired result is based on quantity. Max value of quantity is 4. In group1, group2 &group3 the total values (A+B+C=4)(i.e keeping the max vale of quantity as 4). In group4 we can see that no values to add so the group is formed by the left over(here it is 2). In group2&group3 you can see the value of E and F are divided.

So in future I can select the group by its name or number.

Note: My actual order(column["order"]) looks like this "PMC11-AA1L1PAVWJJ+Z1" its a string.

Is this possible in python. If so kindly suggest me the method. I could practice and learn.

2 Answers 2

2

Your data:

df = pd.DataFrame({'order':['A', 'B', 'C', 'D', 'E', 'F'],'quantity':[1,1,2,3,3,4]})

Solution:

df = pd.DataFrame(np.concatenate(df.apply(lambda x: [x[0]] * x[1], 1).as_matrix()), 
                  columns=['order'])
df['quantity'] = 1
df['group'] = sorted(range(0, len(df)/3, 1) * 4)[0:len(df)]

Output:

   order  quantity  group
0      A         1      0
1      B         1      0
2      C         1      0
3      C         1      0
4      D         1      1
5      D         1      1
6      D         1      1
7      E         1      1
8      E         1      2
9      E         1      2
10     F         1      2
11     F         1      2
12     F         1      3
13     F         1      3

Then groupby and sum.

df.groupby(['group', 'order']).sum()

Output:

             quantity
group order          
0     A             1
      B             1
      C             2
1     D             3
      E             1
2     E             2
      F             2
3     F             2

You can use reset_index() after that, if you want.

I hope it helps.

Should I explain the solution? Does it work for you?

Sign up to request clarification or add additional context in comments.

14 Comments

Thanks, But I am getting an error. Type error: 'float' object cannot be interpreted as an integer. df['group'] = sorted(range(0, len(df)/3, 1) * 4)[0:len(df)] Can u tell me why?
@user10309160 Try to change this raw. df['group'] = sorted(range(0, int(len(df)/3), 1) * 4)[0:len(df)]
Hi, now it showing value error:int() base must be >=2 and <=36,or 0. I don't know whether I made any mistake. I have values all less than or equal to 4 in df['quantity]
@user10309160 it's really strange, I don't have errors. Try just df['group'] = sorted(range(0, len(df)) * 4)[0:len(df)]
again a type error: unsupported operator types for *: 'range' and 'int'
|
2

@AnnaIliukovich-Strakovskaia solution is awesome. I re-wrote it using pure pandas.

#Generate input dataframe from @AnnaIliukovich-Strakovskaia
df = pd.DataFrame({'order':['A', 'B', 'C', 'D', 'E', 'F'],'quantity':[1,1,2,3,3,4]})
#Expand dataframe 
df_out = df.order.repeat(df.quantity).reset_index(drop=True).to_frame()
#Create groupings of four records 
df_out['grp'] = df_out.index // 4
#Groupby 'grp' and count
df_out.groupby(['grp','order'])['order'].count().to_frame(name='quantity')

Output:

           quantity
grp order          
0   A             1
    B             1
    C             2
1   D             3
    E             1
2   E             2
    F             2
3   F             2

3 Comments

Thanks, both are very good method. I know there is no problem from your side. but there is something wrong in my data because I have extracted the data from a huge dataFrame. When using your code I received TypeError: cannot cast array data from dtype('0') to dtype('int64') according to the rule 'safe'
I have a question If my column["order"] has Strings like "PTP31B-AA4M1PGBWWJ" instead of A,B. would the above code works?
@user10309160 Yes, it would it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.