Reshape Pandas Dataframe with multiple column groups

Question

I currently have a wide dataframe that looks like this:

Index ID1 ID2 Foc_A Foc_B Foc_C Sat_A Sat_B Sat_C
   0   r  1   10    15    17    100   105   107
   1   r  2   20    25    27    110   115   117
   2   s  1   30    35    37    120   125   127
   3   s  2   40    45    47    130   135   137

Each entry has multiple identifier columns (ID1 and ID2). I then have two separate categories of measurements (Foc and Sat) each of which contains multiple identifiers (A, B, C)(the category identifiers are shared between categories). I'm eventually needing to plot it in a facet_grid with x and y as each category of measurement and separated by category identifier, so I'm trying to reshape it so that it looks like this:

Index ID1 ID2 Ch  Foc  Sat 
   0   r  1   A   10   100 
   1   r  1   B   15    105
   2   r  1   C   17    107
   3   r  2   A   20    110 
   4   r  2   B   25    115 
   5   r  2   C   27    117 
   6   s  1   A   30    120 
   7   s  1   B   35    125
   8   s  1   C   37    127

I've been trying .melt, .pivot, and .stack but not understanding what I'm doing well enough to make headway.

Erin W · Accepted Answer · 2020-07-13 20:54:59Z

2

You are thinking in the right way. You can do:

# melt the dataframe
d1 = df.set_index(['Index', 'ID1', 'ID2']).stack().reset_index()

# create separate column
d1[['flag', 'Ch']] = d1['level_3'].str.split('_', expand=True)
d1 = d1.drop('level_3', 1)
d1 = d1.rename(columns = {0: 'Foc'})

# expand the dataframe to wide
d2 = pd.pivot_table(d1, index=['Index', 'ID1', 'ID2', 'Ch'], columns=['flag']).reset_index()

# fix column names
d2.columns = ['Index', 'ID1', 'ID2', 'Ch', 'Foc', 'Sat']

print(d2.head())

   Index ID1  ID2 Ch  Foc  Sat
0      0   r    1  A   10  100
1      0   r    1  B   15  105
2      0   r    1  C   17  107
3      1   r    2  A   20  110
4      1   r    2  B   25  115

edited Jul 13, 2020 at 20:54

Erin W

596 bronze badges

answered Jul 13, 2020 at 20:40

YOLO

22k5 gold badges25 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Erin W Over a year ago

Thanks! I have a further point of confusion if I use my real column names. There are many _ and the number can differ. My actual column names are ImageQuality_Focus and ImageQuality_Saturation (instead of Foc and Sat) and are further complicated by the fact that the A and B are variables set by the user and can contain any number of _. The set I'm currently using has 1 _ per variable for 4 of the variables and 2 _ in the final variable. So I will need to be able to parse "ImageQuality_Focus_Pre_Channel1" and "Image_Quality_Focus_Pre_Channel2_Type"

YOLO Over a year ago

@ErinW str.split has an argument n using which we can specify at which _ we would like to split the column name: pandas.pydata.org/pandas-docs/stable/reference/api/…

Paul H · Accepted Answer · 2020-07-13 20:50:01Z

I'd set ID columns to the index, split and expand the columns on the '_' character, then stack the dataframe:

from io import StringIO
import pandas

datafile = StringIO("""\
Index ID1 ID2 Foc_A Foc_B Foc_C Sat_A Sat_B Sat_C
   0   r  1   10    15    17    100   105   107
   1   r  2   20    25    27    110   115   117
   2   s  1   30    35    37    120   125   127
   3   s  2   40    45    47    130   135   137
""")

(
    pandas.read_csv(datafile, sep='\s+')
        .set_index(['ID1', 'ID2'])
        .drop(columns='Index')
        .pipe(lambda df:
             df.set_axis(
                 df.columns.str.split('_', expand=True),
                 axis="columns"
            )
        )
        .rename_axis([None, 'Ch'], axis='columns')
        .stack(level='Ch')
        .reset_index()
)

And that give me:

   ID1  ID2 Ch  Foc  Sat
0    r    1  A   10  100
1    r    1  B   15  105
2    r    1  C   17  107
3    r    2  A   20  110
4    r    2  B   25  115
5    r    2  C   27  117
6    s    1  A   30  120
7    s    1  B   35  125
8    s    1  C   37  127
9    s    2  A   40  130
10   s    2  B   45  135
11   s    2  C   47  137

BENY · Accepted Answer · 2020-07-13 21:11:13Z

1

Let us do wide_to_long

df = pd.wide_to_long(df,['Foc','Sat'],i=['ID1','ID2'],j='Ch',sep='_',suffix='\w+').reset_index()
Out[168]: 
   ID1  ID2 Ch  Foc  Sat
0    r    1  A   10  100
1    r    1  B   15  105
2    r    1  C   17  107
3    r    2  A   20  110
4    r    2  B   25  115
5    r    2  C   27  117
6    s    1  A   30  120
7    s    1  B   35  125
8    s    1  C   37  127
9    s    2  A   40  130
10   s    2  B   45  135
11   s    2  C   47  137

answered Jul 13, 2020 at 21:11

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

David Erickson · Accepted Answer · 2020-07-13 20:56:48Z

you could also achieve this with .melt, .groupby and np.where:

df = pd.melt(df, id_vars=['ID1','ID2','Foc_A', 'Foc_B', 'Foc_C'], var_name='Ch', value_name='Sat') \
    .groupby(['ID1','ID2','Ch']).agg({'Foc_A':'max','Foc_B':'max', 'Foc_C':'max','Sat':'max'}).reset_index()
df['Foc'] = np.where((df['Ch'] == 'Sat_A'), df['Foc_A'], '')
df['Foc'] = np.where((df['Ch'] == 'Sat_B'), df['Foc_B'], df['Foc'])
df['Foc'] = np.where((df['Ch'] == 'Sat_C'), df['Foc_C'], df['Foc'])
df['Ch'] = df['Ch'].str.replace('Sat_', '')
df = df.drop(['Foc_A', 'Foc_B', 'Foc_C'], axis=1)
df

output:

    ID1 ID2 Ch      Sat Foc
0   r   1   A       100 10
1   r   1   B       105 15
2   r   1   C       107 17
3   r   2   A       110 20
4   r   2   B       115 25
5   r   2   C       117 27
6   s   1   A       120 30
7   s   1   B       125 35
8   s   1   C       127 37
9   s   2   A       130 40
10  s   2   B       135 45
11  s   2   C       137 47

Umar.H · Accepted Answer · 2020-07-13 21:16:05Z

0

Creating a multi_index then using stack

df = df.set_index(['ID1','ID2'])

df.columns = df.columns.str.split('_',expand=True)

df1 = df.stack(1).reset_index().rename(columns={'level_2' : 'Ch'})

   ID1  ID2 Ch  Foc  Sat
0    r    1  A   10  100
1    r    1  B   15  105
2    r    1  C   17  107
3    r    2  A   20  110
4    r    2  B   25  115
5    r    2  C   27  117
6    s    1  A   30  120
7    s    1  B   35  125
8    s    1  C   37  127
9    s    2  A   40  130
10   s    2  B   45  135
11   s    2  C   47  137

answered Jul 13, 2020 at 21:16

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

Collectives™ on Stack Overflow

Reshape Pandas Dataframe with multiple column groups

5 Answers 5

2 Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related