Python Transpose/Stack multiple columns

Question

Have a number of people that return responses to three questions.

These three questions are asked numerous times, the issue is new responses are recorded as new columns.

Ideally the output would look similar to the below:

Have been exploring melt within pandas.

pd.melt(df, id_vars=['PersonID'], value_vars=['Q1', 'Q1_1', 'Q1_2', 'Q1_999' ] )

Looking for a more elegant solution then listing the value_vars Q1 to Q1_999

can you provide sample data, or is @Henry's data sufficient? — sammywemmy
– sammywemmy, Commented Aug 5, 2021 at 3:17

Henry Ecker · Accepted Answer · 2021-08-05 02:26:42Z

1

With a little renaming to add a suffix to the base stubnames, we can the use pd.wide_to_long:

# Add Suffix to base Q1 Q2 Q3
df = df.rename(columns=dict(zip(['Q1', 'Q2', 'Q3'],
                                ['Q1_0', 'Q2_0', 'Q3_0'])))

# Wide To Long
df = pd.wide_to_long(
    df,
    i='PersonID',
    stubnames=['Q1', 'Q2', 'Q3'],
    j='Attempt',
    sep='_'
).sort_index().reset_index()  # Order by PersonID instead of Attempt

Some sample Data and Ouptut:

import pandas as pd

df = pd.DataFrame({
    'PersonID': [1, 2, 3, 4, 5],
    'Q1': [232, 415, 152, 123, 234],
    'Q2': [2, 241, 5, 5, 5, ],
    'Q3': ['Yes', 'Yes', 'Yes', 'No', 'Yes'],
    'Q1_1': [10, 11, 12, 13, 14],
    'Q2_1': [15, 16, 17, 18, 19],
    'Q3_1': ['a', 'b', 'c', 'd', 'e'],
    'Q1_2': [20, 21, 22, 23, 24],
    'Q2_2': [25, 26, 27, 28, 29],
    'Q3_2': ['f', 'g', 'h', 'i', 'j']
})

Wide to long passing the results of rename directly:

df = pd.wide_to_long(
    df.rename(columns=dict(zip(['Q1', 'Q2', 'Q3'],
                               ['Q1_0', 'Q2_0', 'Q3_0']))),
    i='PersonID',
    stubnames=['Q1', 'Q2', 'Q3'],
    j='Attempt',
    sep='_'
).sort_index().reset_index()

Output:

    PersonID  Attempt   Q1   Q2   Q3
0          1        0  232    2  Yes
1          1        1   10   15    a
2          1        2   20   25    f
3          2        0  415  241  Yes
4          2        1   11   16    b
5          2        2   21   26    g
6          3        0  152    5  Yes
7          3        1   12   17    c
8          3        2   22   27    h
9          4        0  123    5   No
10         4        1   13   18    d
11         4        2   23   28    i
12         5        0  234    5  Yes
13         5        1   14   19    e
14         5        2   24   29    j

edited Aug 5, 2021 at 2:26

answered Aug 5, 2021 at 2:18

Henry Ecker♦

35.8k19 gold badges48 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

AdianDes Over a year ago

Thanks for your response Henry; Wonder what if the questions went from Q1 Q2 Q3 to past Q10, Q11; Would the stubnames Q1 use Q10 and Q11 as well? Also for the separator; what if the separator was a space instead of an underscore

Henry Ecker Over a year ago

sep=' ' to change the separator. For stubnames list them all -> stubnames=['Q1', 'Q2', ... etc.] Programmatically would be fine too stubnames=[f'Q{i}' for i in range(1, 12)]

AdianDes Over a year ago

Thanks Henry, that makes sense; what about if each header contained multiple underscores like 'Q1_1_1' 'Q1_2_2' (we wouldn't care about what happens after the second)

AdianDes Over a year ago

added a line to just remove everything after the last underscore df.columns = df.columns.str.rsplit('_', 1).str.get(0) :) thanks again for your help

Collectives™ on Stack Overflow

Python Transpose/Stack multiple columns

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related