1

Run the code to create the obfuscated data:

from itertools import product

import numpy as np
import pandas as pd


a = [1, 2, 3]
b = [20, 40, 80, 160]
c = ['A', 'B']
d = [10, 20, 30, 40]
e = ['Alpha', 'Beta']
f = ['X', 'Y', 'Z']
i = ['a', 'b', 'c', 'd', 'e', 'f']

df = pd.DataFrame(product(a, b, c, d, e, f, i),
                  columns=['col_A', 'col_B', 'col_C', 'col_D', 'col_E', 'col_F', 'col_G'])

df['col_H'] = np.random.randint(0, 100, size=df.shape[0])
df['col_I'] = np.random.randint(0, 100, size=df.shape[0])
df['col_J'] = np.random.randint(0, 100, size=df.shape[0])

print(df)

I get the following table:

      col_A  col_B col_C  col_D  col_E col_F col_G  col_H  col_I  col_J
0         1     20     A     10  Alpha     X     a     44     55     12
1         1     20     A     10  Alpha     X     b     50     15     46
2         1     20     A     10  Alpha     X     c     42     36     46
3         1     20     A     10  Alpha     X     d     71     19     61
4         1     20     A     10  Alpha     X     e     74      1      7
5         1     20     A     10  Alpha     X     f     11     99      2
...     ...    ...   ...    ...    ...   ...   ...    ...    ...    ...
3450      3    160     B     40   Beta     Z     a     47     81     58
3451      3    160     B     40   Beta     Z     b     63     82     26
3452      3    160     B     40   Beta     Z     c     37     28     64
3453      3    160     B     40   Beta     Z     d     54     60     91
3454      3    160     B     40   Beta     Z     e     16     14     55
3455      3    160     B     40   Beta     Z     f     40     18     97

[3456 rows x 10 columns]

I want to transpose the last 4 columns, set the values of first column as headers, and add a new column based on the remaining 3 columns names.

See below:

      col_A  col_B col_C  col_D  col_E col_F new_col   a   b   c   d   e   f
0         1     20     A     10  Alpha     X   col_H  68  89   0  99  50  11
1         1     20     A     10  Alpha     X   col_I  27  43  20  45  97  99
2         1     20     A     10  Alpha     X   col_J  40  43  36  89  90   2
...     ...    ...   ...    ...    ...   ...     ... ... ... ... ... ... ...
1150      3    160     B     40   Beta     Z   col_H  59  50  95  31  32  68
1151      3    160     B     40   Beta     Z   col_I  14  94  25  20  46  90
1152      3    160     B     40   Beta     Z   col_J  87  32  34  99  91  90

I've tried doing it with melt, pivot, and pivot_table, but I can't figure it out.

1 Answer 1

2

Here's one way via set_index / stack / unstack:

df = df.set_index(df.columns[:-3].to_list()).stack().unstack(-2).reset_index()

OUTPUT:

print(df.head(10))

col_G  col_A  col_B col_C  col_D  col_E col_F level_6   a   b   c   d   e   f
0          1     20     A     10  Alpha     X   col_H  97  96  59  71  98  19
1          1     20     A     10  Alpha     X   col_I   0  55   8  26  70  85
2          1     20     A     10  Alpha     X   col_J  90  87  16  70  68  39
3          1     20     A     10  Alpha     Y   col_H  65  69   9  92  35  11
4          1     20     A     10  Alpha     Y   col_I  28  47  56  83  43  81
5          1     20     A     10  Alpha     Y   col_J   8  29  89  47  84  32
6          1     20     A     10  Alpha     Z   col_H  97  12  95  17  34  47
7          1     20     A     10  Alpha     Z   col_I  78  64  51  81  15  62
8          1     20     A     10  Alpha     Z   col_J  47  83  58   3  90  11
9          1     20     A     10   Beta     X   col_H  19   8  92  69  65  32
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.