2

So the data frame is

Product    Price  Weight  Range   Count
   A        40      20      1-3     20
   A        40      20      4-7     23
   B        20      73      1-3     54
   B        20      73      4-7     43
   B        20      73      8-15    34
   B        20      73      >=16    12
   C        10      20      4-7     22

So basically there is a product with price and weight and the range here specifies the no of days the product was sold continuously and the count specifies the count of products sold in that range

Expected Output

Product    Price  Weight  Range   Count
   A        40      20      1-3     20
                            4-7     23
   B        20      73      1-3     54
                            4-7     43
                            8-15    34
   B        20      73      >=16    12
   C        10      20      4-7     22

or

   Product  Price  Weight  1-3   4-7   8-15  >=16
   A        40      20     20     23   NaN    NaN
   B        20      73     54     43   34     1
   C        10      20      0     22   NaN    NaN                     
1
  • Did you try my solution at all? It's almost the same as the answer you just accepted. Commented Jun 6, 2018 at 6:18

3 Answers 3

3

Fulfilling the second output makes more sense than the first. Use set_index, followed by unstack.

(df.set_index(['Product', 'Price', 'Weight', 'Range'])
  .Count
  .unstack(fill_value=0)
  .reset_index()
)

Range Product  Price  Weight  1-3  4-7  8-15  >=16
0           A     40      20   20   23     0     0
1           B     20      73   54   43    34    12
2           C     10     100    0   22     0     0
Sign up to request clarification or add additional context in comments.

Comments

2

In my opinion first solution is not recommended if need processes DataFrame later.

Second solution is much better and if duplicates in real data is necessary aggregate values, e.g. by sum:

#convert catagoricals to strings
df['Range'] = df['Range'].astype(str)

df = (df.groupby(['Product', 'Price', 'Weight', 'Range'])['Count']
        .sum()
        .unstack(fill_value=0)
        .reset_index())
print (df)
Range Product  Price  Weight  1-3  4-7  8-15  >=16
0           A     40      20   20   23     0     0
1           B     20      73   54   43    34    12
2           C     10      20    0   22     0     0

5 Comments

Something tells me this OP is biased. Two nearly identical solutions, one posted an hour earlier. I bet set index is faster than groupby too, but it seems they don't care. Oh well, you are a lucky man. Enjoy
its was giving an error:cannot insert an item into a CategoricalIndex that is not already an existing category
i just saw this code:df['Range'] = df['Range'].astype(str) and worked perfectly
@san Thanks for not telling me about it. I wouldn't know there was an error if you didn't tell me, because I can't read minds.
sry mate won't happen again no offense
1

try this,

mask=df.duplicated(subset=['Product'])
df.loc[mask,['Product','Price','Weight']]=''

Output:

  Product Price Weight Range  Count
0       A    40     20   1-3     20
1                        4-7     23
2       B    20     73   1-3     54
3                        4-7     43
4                       8-15     34
5                       >=16     12
6       C    10    100   4-7     22

.

pd.pivot_table(df,index=['Product','Price','Weight'],columns='Range',values='Count').reset_index()

Output:

Range Product  Price  Weight   1-3   4-7  8-15  >=16
0           A     40      20  20.0  23.0   NaN   NaN
1           B     20      73  54.0  43.0  34.0  12.0
2           C     10     100   NaN  22.0   NaN   NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.