Pandas transform rows to columns

Question

I have a pandas dataframe that looks like this:

index	p1	a1	phase	file_number	e1
388	19.288	21.630	0.0	0	0.0
389	40.910	71.489	1.0	0	0.0
390	31.310	43.952	2.0	0	0.0
391	28.420	30.250	3.0	0	0.0
392	17.940	22.000	0.0	1	0.0
393	38.020	68.750	1.0	1	0.0
394	31.230	48.352	2.0	1	1.0
395	26.902	29.880	3.0	1	0.0

We can create it using this code

d = {'p1': {388: 19.288,389: 40.91,390: 31.31,391: 28.42,392: 17.94,393: 38.02,394: 31.23,395: 26.902},
     'a1': {388: 21.63,389: 71.489,390: 43.952,391: 30.25,392: 22.0,393: 68.75,394: 48.352,395: 29.88},
     'phase': {388: 0.0,389: 1.0,390: 2.0,391: 3.0,392: 0.0,393: 1.0,394: 2.0,395: 3.0},
     'file_number': {388: 0, 389: 0, 390: 0, 391: 0, 392: 1, 393: 1, 394: 1, 395: 1},
     'e1': {388: 0.0,389: 0.0,390: 0.0,391: 0.0,392: 0.0,393: 1.0,394: 0.0,395: 0.0}}

df = pd.DataFrame(d)

For I want to transform this dataframe, so I have 1 row for every file_number. And transform it with respect to phase - basically collapse many rows into one for each file_number. Phase number will always be 0, 1, 2, 3. Final table should look like this:

p1_0	p1_1	p1_2	p1_3	a1_0	p1_1	a1_2	a1_3	e1_0	e1_1	e1_2	e1_3
19.288	40.910	31.310	28.420	21.630	71.489	43.952	30.250	0	0	0	0
17.940	38.020	31.230	26.902	22.000	68.750	48.352	29.880	0	0	1	0

Where suffix means p1_phase, a1_phase and so on.

I want to make it as fast as possible. Since my data is very large, I'd rather avoid looping.

there are numerous answers on Stack Overflow for this. It's called pivoting in pandas. Look at stack, pivot or pivot_table — sammywemmy
– sammywemmy, Commented Jul 14, 2021 at 11:40

Chris Wesseling · Accepted Answer · 2021-07-14 14:42:35Z

1

d = {'p1': {388: 19.288,389: 40.91,390: 31.31,391: 28.42,392: 17.94,393: 38.02,394: 31.23,395: 26.902},
     'a1': {388: 21.63,389: 71.489,390: 43.952,391: 30.25,392: 22.0,393: 68.75,394: 48.352,395: 29.88},
     'phase': {388: 0.0,389: 1.0,390: 2.0,391: 3.0,392: 0.0,393: 1.0,394: 2.0,395: 3.0},
     'file_number': {388: 0, 389: 0, 390: 0, 391: 0, 392: 1, 393: 1, 394: 1, 395: 1},
     'e1': {388: 0.0,389: 0.0,390: 0.0,391: 0.0,392: 0.0,393: 1.0,394: 0.0,395: 0.0}}

df = pd.DataFrame(d)
# pivot the data
pivoted = df.pivot(index='file_number', columns='phase')
# flatten the columns
pivoted.columns = [f'{col[0]}_{int(col[1])}'  for col in pivoted.columns.values]

After this pivoted is a Dataframe with the shape you desired.

Basically a combination of these two questions:

answered Jul 14, 2021 at 14:42

Chris Wesseling

6,3962 gold badges39 silver badges80 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Evangelista01 Over a year ago

Thanks, this one worked! The way that I found was res = pd.pivot_table(df, index=['file_number', 'phase']).unstack().reset_index(drop=True) res.columns = [f'{col[0]}_{int(col[1])}' for col in res.columns] but your solution is cleaner

Chris Wesseling Over a year ago

:) That's about the same. If file_number is unimportant, it makes sense to re-index and drop it.

Collectives™ on Stack Overflow

Pandas transform rows to columns

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related