2

I have a pandas dataframe that looks like this:

index p1 a1 phase file_number e1
388 19.288 21.630 0.0 0 0.0
389 40.910 71.489 1.0 0 0.0
390 31.310 43.952 2.0 0 0.0
391 28.420 30.250 3.0 0 0.0
392 17.940 22.000 0.0 1 0.0
393 38.020 68.750 1.0 1 0.0
394 31.230 48.352 2.0 1 1.0
395 26.902 29.880 3.0 1 0.0

We can create it using this code

d = {'p1': {388: 19.288,389: 40.91,390: 31.31,391: 28.42,392: 17.94,393: 38.02,394: 31.23,395: 26.902},
     'a1': {388: 21.63,389: 71.489,390: 43.952,391: 30.25,392: 22.0,393: 68.75,394: 48.352,395: 29.88},
     'phase': {388: 0.0,389: 1.0,390: 2.0,391: 3.0,392: 0.0,393: 1.0,394: 2.0,395: 3.0},
     'file_number': {388: 0, 389: 0, 390: 0, 391: 0, 392: 1, 393: 1, 394: 1, 395: 1},
     'e1': {388: 0.0,389: 0.0,390: 0.0,391: 0.0,392: 0.0,393: 1.0,394: 0.0,395: 0.0}}

df = pd.DataFrame(d)

For I want to transform this dataframe, so I have 1 row for every file_number. And transform it with respect to phase - basically collapse many rows into one for each file_number. Phase number will always be 0, 1, 2, 3. Final table should look like this:

p1_0 p1_1 p1_2 p1_3 a1_0 p1_1 a1_2 a1_3 e1_0 e1_1 e1_2 e1_3
19.288 40.910 31.310 28.420 21.630 71.489 43.952 30.250 0 0 0 0
17.940 38.020 31.230 26.902 22.000 68.750 48.352 29.880 0 0 1 0

Where suffix means p1_phase, a1_phase and so on.

I want to make it as fast as possible. Since my data is very large, I'd rather avoid looping.

2
  • there are numerous answers on Stack Overflow for this. It's called pivoting in pandas. Look at stack, pivot or pivot_table Commented Jul 14, 2021 at 11:40
  • Read Q/A 10 in this question. Commented Jul 14, 2021 at 11:47

1 Answer 1

1
d = {'p1': {388: 19.288,389: 40.91,390: 31.31,391: 28.42,392: 17.94,393: 38.02,394: 31.23,395: 26.902},
     'a1': {388: 21.63,389: 71.489,390: 43.952,391: 30.25,392: 22.0,393: 68.75,394: 48.352,395: 29.88},
     'phase': {388: 0.0,389: 1.0,390: 2.0,391: 3.0,392: 0.0,393: 1.0,394: 2.0,395: 3.0},
     'file_number': {388: 0, 389: 0, 390: 0, 391: 0, 392: 1, 393: 1, 394: 1, 395: 1},
     'e1': {388: 0.0,389: 0.0,390: 0.0,391: 0.0,392: 0.0,393: 1.0,394: 0.0,395: 0.0}}

df = pd.DataFrame(d)
# pivot the data
pivoted = df.pivot(index='file_number', columns='phase')
# flatten the columns
pivoted.columns = [f'{col[0]}_{int(col[1])}'  for col in pivoted.columns.values]

After this pivoted is a Dataframe with the shape you desired.

Basically a combination of these two questions:

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, this one worked! The way that I found was res = pd.pivot_table(df, index=['file_number', 'phase']).unstack().reset_index(drop=True) res.columns = [f'{col[0]}_{int(col[1])}' for col in res.columns] but your solution is cleaner
:) That's about the same. If file_number is unimportant, it makes sense to re-index and drop it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.