Python pandas groupby values based on multiple column values

Question

I have a sequential campaign data in Pandas dataset.

#sample data code 
user_id = [9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,4705,4705,4705,4705,4705,223,223,223,223,223,223,223,223]
transaction_Value= [50,125,0,100,0,1000,473,0,47,110,0,44,93,0,49,92,0,242,0,75,0,47,122,0,50,100,200,0,35,85,0,50]
Campaign = ['M1','M1','Used','M1','Used','W1','Used','Used','W2','W2','Used','W2','W2','Used','W2','W2','Used','O1','Used','W3','Used','W2','S1','Lost','M1','M1','M1','Used','W2','S2','Lost','S2',]
transaction_c= [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,1,2,3,4,5,1,2,3,4,5,6,7,8]
 
df = pd.DataFrame(list(zip(user_id,transaction_Value,Campaign,transaction_c)), columns =['user_id','transaction_Value', 'Campaign','transaction_c'])

So far I have used the following code to group the data

df2 = (df.set_index(['user_id',df.groupby('user_id').cumcount()])[('transaction_Value')]
         .unstack(fill_value='')
         .reset_index())

This Transposes the value based on the transaction number

| user_id | 0  | 1   | 2   | 3   | 4  | 5    | 6   | 7  | 8  | 9   | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17  | 18 |
|---------|----|-----|-----|-----|----|------|-----|----|----|-----|----|----|----|----|----|----|----|-----|----|
| 9       | 50 | 125 | 0   | 100 | 0  | 1000 | 473 | 0  | 47 | 110 | 0  | 44 | 93 | 0  | 49 | 92 | 0  | 242 | 0  |
| 223     | 50 | 100 | 200 | 0   | 35 | 85   | 0   | 50 |    |     |    |    |    |    |    |    |    |     |    |
| 4705    | 75 | 0   | 47  | 122 | 0  |      |     |    |    |     |    |    |    |    |    |    |    |     |    |

how do I write a code so that this is changed to every time the rows value is used or lost

I could do the same for the Campaign values and then stack these 2 dataframes together

Ideal output

| user_id | Type        | 1    | 2    | 3    | 4    |
|---------|-------------|------|------|------|------|
| 9       | Campaign    | M1   | M1   | Used |      |
| 9       | Campaign    | M1   | Used |      |      |
| 9       | Campaign    | W1   | Used |      |      |
| 9       | Campaign    | Used |      |      |      |
| 9       | Campaign    | W2   | W2   | Used |      |
| 9       | Campaign    | W2   | W2   | Used |      |
| 9       | Campaign    | W2   | W2   | Used |      |
| 9       | Campaign    | O1   | Used |      |      |
| 223     | Campaign    | M1   | M1   | M1   | Used |
| 223     | Campaign    | W2   | S2   | Lost |      |
| 223     | Campaign    | S2   |      |      |      |
| 9       | Transaction | 50   | 125  | 0    |      |
| 9       | Transaction | 100  | 0    |      |      |
| 9       | Transaction | 1000 | 473  |      |      |
| 9       | Transaction | 0    |      |      |      |
| 9       | Transaction | 47   | 110  | 0    |      |
| 9       | Transaction | 44   | 93   | 0    |      |
| 9       | Transaction | 49   | 92   | 0    |      |
| 223     | Transaction | 242  | 0    |      |      |
| 223     | Transaction | 50   | 100  | 200  | 0    |
| 223     | Transaction | 35   | 85   | 0    |      |
| 223     | Transaction | 50   |      |      |      |

Appreciate all the help in doing resolving this . thanks :)

@JoeFerndz if the transpose is Campaign then its Campaign else if it for the transaction_Value then its Transaction — Aniruddha Das
– Aniruddha Das, Commented Mar 16, 2021 at 7:15
what transpose is Campaign? Your original dataset does not have Campaign as a value. Are you referring to the Column with Campaign and Transaction? — Joe Ferndz
– Joe Ferndz, Commented Mar 16, 2021 at 7:17

jezrael · Accepted Answer · 2021-03-16 08:57:12Z

1

Create groups by test Campaign by Series.isin with change order by iloc and created groups by Series.cumsum, added to set_index and groupby and then use DataFrame.stack with sorting by third level, last remove second level and convert MultiIndex to columns:

g = df['Campaign'].isin(['Used','Lost']).iloc[::-1].cumsum().iloc[::-1]
g = pd.factorize(g)[0]

df2 = (df.set_index(['user_id',g, df.groupby(['user_id', g]).cumcount()])[['Campaign','transaction_Value']]
          .unstack(fill_value='')
          .stack(0)
          .sort_index(level=[2])
          .rename_axis(['user_id','Campaign','Type'])
          .reset_index(level=1, drop=True)
          .reset_index())

print (df2)
    user_id               Type     0     1     2     3
0         9           Campaign    M1    M1  Used      
1         9           Campaign    M1  Used            
2         9           Campaign    W1  Used            
3         9           Campaign  Used                  
4         9           Campaign    W2    W2  Used      
5         9           Campaign    W2    W2  Used      
6         9           Campaign    W2    W2  Used      
7         9           Campaign    O1  Used            
8       223           Campaign    M1    M1    M1  Used
9       223           Campaign    W2    S2  Lost      
10      223           Campaign    S2                  
11     4705           Campaign    W3  Used            
12     4705           Campaign    W2    S1  Lost      
13        9  transaction_Value    50   125     0      
14        9  transaction_Value   100     0            
15        9  transaction_Value  1000   473            
16        9  transaction_Value     0                  
17        9  transaction_Value    47   110     0      
18        9  transaction_Value    44    93     0      
19        9  transaction_Value    49    92     0      
20        9  transaction_Value   242     0            
21      223  transaction_Value    50   100   200     0
22      223  transaction_Value    35    85     0      
23      223  transaction_Value    50                  
24     4705  transaction_Value    75     0            
25     4705  transaction_Value    47   122     0

edited Mar 16, 2021 at 8:57

answered Mar 16, 2021 at 7:22

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Aniruddha Das Over a year ago

How do I keep the transaction transaction_c column in the data ? in this case for the user_id 223 the first Campaign was M1 and not S2

jezrael Over a year ago

@AniruddhaDas - You are right, added pd.factorize() to correct ordering of groups g

Collectives™ on Stack Overflow

Python pandas groupby values based on multiple column values

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related