1

Have a number of people that return responses to three questions.

These three questions are asked numerous times, the issue is new responses are recorded as new columns.

Dataset

Ideally the output would look similar to the below:

enter image description here

Have been exploring melt within pandas.

pd.melt(df, id_vars=['PersonID'], value_vars=['Q1', 'Q1_1', 'Q1_2', 'Q1_999' ] )

Looking for a more elegant solution then listing the value_vars Q1 to Q1_999

2
  • can you provide sample data, or is @Henry's data sufficient? Commented Aug 5, 2021 at 3:17
  • Henrys data is sufficient Commented Aug 5, 2021 at 9:53

1 Answer 1

1

With a little renaming to add a suffix to the base stubnames, we can the use pd.wide_to_long:

# Add Suffix to base Q1 Q2 Q3
df = df.rename(columns=dict(zip(['Q1', 'Q2', 'Q3'],
                                ['Q1_0', 'Q2_0', 'Q3_0'])))

# Wide To Long
df = pd.wide_to_long(
    df,
    i='PersonID',
    stubnames=['Q1', 'Q2', 'Q3'],
    j='Attempt',
    sep='_'
).sort_index().reset_index()  # Order by PersonID instead of Attempt 

Some sample Data and Ouptut:

import pandas as pd

df = pd.DataFrame({
    'PersonID': [1, 2, 3, 4, 5],
    'Q1': [232, 415, 152, 123, 234],
    'Q2': [2, 241, 5, 5, 5, ],
    'Q3': ['Yes', 'Yes', 'Yes', 'No', 'Yes'],
    'Q1_1': [10, 11, 12, 13, 14],
    'Q2_1': [15, 16, 17, 18, 19],
    'Q3_1': ['a', 'b', 'c', 'd', 'e'],
    'Q1_2': [20, 21, 22, 23, 24],
    'Q2_2': [25, 26, 27, 28, 29],
    'Q3_2': ['f', 'g', 'h', 'i', 'j']
})

Wide to long passing the results of rename directly:

df = pd.wide_to_long(
    df.rename(columns=dict(zip(['Q1', 'Q2', 'Q3'],
                               ['Q1_0', 'Q2_0', 'Q3_0']))),
    i='PersonID',
    stubnames=['Q1', 'Q2', 'Q3'],
    j='Attempt',
    sep='_'
).sort_index().reset_index()

Output:

    PersonID  Attempt   Q1   Q2   Q3
0          1        0  232    2  Yes
1          1        1   10   15    a
2          1        2   20   25    f
3          2        0  415  241  Yes
4          2        1   11   16    b
5          2        2   21   26    g
6          3        0  152    5  Yes
7          3        1   12   17    c
8          3        2   22   27    h
9          4        0  123    5   No
10         4        1   13   18    d
11         4        2   23   28    i
12         5        0  234    5  Yes
13         5        1   14   19    e
14         5        2   24   29    j
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your response Henry; Wonder what if the questions went from Q1 Q2 Q3 to past Q10, Q11; Would the stubnames Q1 use Q10 and Q11 as well? Also for the separator; what if the separator was a space instead of an underscore
sep=' ' to change the separator. For stubnames list them all -> stubnames=['Q1', 'Q2', ... etc.] Programmatically would be fine too stubnames=[f'Q{i}' for i in range(1, 12)]
Thanks Henry, that makes sense; what about if each header contained multiple underscores like 'Q1_1_1' 'Q1_2_2' (we wouldn't care about what happens after the second)
added a line to just remove everything after the last underscore df.columns = df.columns.str.rsplit('_', 1).str.get(0) :) thanks again for your help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.