pandas stack multiple columns into multiple columns

Question

I have a dataframe 6k columns wide, of the format:

import pandas as pd
df = pd.DataFrame([('jan 1 2000','a','b','c',1,2,3,'aa','bb','cc'), ('jan 2 2000','d', 'e', 'f', 4, 5, 6, 'dd', 'ee', 'ff')],
                  columns=['date','a_1', 'a_2', 'a_3','b_1', 'b_2', 'b_3','c_1', 'c_2', 'c_3'])

df
    date         a_1  a_2  a_3  b_1  b_2  b_3  c_1  c_2  c_3
0   jan 1 2000   a    b    c    1    2    3    aa   bb   cc
1   jan 2 2000   d    e    f    4    5    6    dd   ee   ff

I want:

I have looked at: Pandas Melt several groups of columns into multiple target columns by name and Pandas: Multiple columns into one column but am unable to form a correct solution.

Any suggestions are appreciated

Scott Boston · Accepted Answer · 2019-08-29 16:08:19Z

5

Use pd.wide_to_long and some dataframe reshaping.

pd.wide_to_long(df, ['a','b','c'], 'date', 'ID', '_')\
  .rename_axis('ID', axis=1)\
  .stack()\
  .unstack(1)\
  .reset_index()

Output:

ID         date ID   1   2   3
0   jan 1, 2000  a   a   b   c
1   jan 1, 2000  b   1   2   3
2   jan 1, 2000  c  aa  bb  cc
3   jan 2, 2000  a   d   e   f
4   jan 2, 2000  b   4   5   6
5   jan 2, 2000  c  dd  ee  ff

Where df is:

df = pd.DataFrame([('jan 1, 2000','a','b','c',1,2,3,'aa','bb','cc'), ('jan 2, 2000','d', 'e', 'f', 4, 5, 6, 'dd', 'ee', 'ff')],
                  columns=['date','a_1', 'a_2', 'a_3','b_1', 'b_2', 'b_3','c_1', 'c_2', 'c_3'])
df

Input df:

          date a_1 a_2 a_3  b_1  b_2  b_3 c_1 c_2 c_3
0  jan 1, 2000   a   b   c    1    2    3  aa  bb  cc
1  jan 2, 2000   d   e   f    4    5    6  dd  ee  ff

answered Aug 29, 2019 at 16:08

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mark Wang Over a year ago

In this case, wide_to_long is more convenient compared with stack/melt

frank Over a year ago

I received error: ValueError: Duplicated level name: "ID", assigned to level 2, is already used for level 1.

jezrael · Accepted Answer · 2019-08-29 16:12:33Z

4

Create MultiIndex in columns with split and reshape by DataFrame.stack by first level:

df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack(0).rename_axis(('date', 'ID')).reset_index()

print (df)
        date ID   1   2   3
0 2000-01-01  a   a   b   c
1 2000-01-01  b   1   2   3
2 2000-01-01  c  aa  bb  cc
3 2000-01-02  a   d   e   f
4 2000-01-02  b   4   5   6
5 2000-01-02  c  dd  ee  ff

edited Aug 29, 2019 at 16:12

answered Aug 29, 2019 at 16:05

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Comments

sammywemmy · Accepted Answer · 2022-03-30 09:38:23Z

0

One option is the pivot_longer function from pyjanitor, using the .value placeholder:

# pip install pyjanitor
import pandas as pd
import janitor 

df.pivot_longer(
    index = 'date', 
    names_to = ('ID', '.value'), 
    names_sep='_', 
    sort_by_appearance=True)

         date ID   1   2   3
0  jan 1 2000  a   a   b   c
1  jan 1 2000  b   1   2   3
2  jan 1 2000  c  aa  bb  cc
3  jan 2 2000  a   d   e   f
4  jan 2 2000  b   4   5   6
5  jan 2 2000  c  dd  ee  ff

answered Mar 30, 2022 at 9:38

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Collectives™ on Stack Overflow

pandas stack multiple columns into multiple columns

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related