3

I have a dataframe 6k columns wide, of the format:

import pandas as pd
df = pd.DataFrame([('jan 1 2000','a','b','c',1,2,3,'aa','bb','cc'), ('jan 2 2000','d', 'e', 'f', 4, 5, 6, 'dd', 'ee', 'ff')],
                  columns=['date','a_1', 'a_2', 'a_3','b_1', 'b_2', 'b_3','c_1', 'c_2', 'c_3'])

df
    date         a_1  a_2  a_3  b_1  b_2  b_3  c_1  c_2  c_3
0   jan 1 2000   a    b    c    1    2    3    aa   bb   cc
1   jan 2 2000   d    e    f    4    5    6    dd   ee   ff

I want:

enter image description here

I have looked at: Pandas Melt several groups of columns into multiple target columns by name and Pandas: Multiple columns into one column but am unable to form a correct solution.

Any suggestions are appreciated

3 Answers 3

5

Use pd.wide_to_long and some dataframe reshaping.

pd.wide_to_long(df, ['a','b','c'], 'date', 'ID', '_')\
  .rename_axis('ID', axis=1)\
  .stack()\
  .unstack(1)\
  .reset_index()

Output:

ID         date ID   1   2   3
0   jan 1, 2000  a   a   b   c
1   jan 1, 2000  b   1   2   3
2   jan 1, 2000  c  aa  bb  cc
3   jan 2, 2000  a   d   e   f
4   jan 2, 2000  b   4   5   6
5   jan 2, 2000  c  dd  ee  ff

Where df is:

df = pd.DataFrame([('jan 1, 2000','a','b','c',1,2,3,'aa','bb','cc'), ('jan 2, 2000','d', 'e', 'f', 4, 5, 6, 'dd', 'ee', 'ff')],
                  columns=['date','a_1', 'a_2', 'a_3','b_1', 'b_2', 'b_3','c_1', 'c_2', 'c_3'])
df

Input df:

          date a_1 a_2 a_3  b_1  b_2  b_3 c_1 c_2 c_3
0  jan 1, 2000   a   b   c    1    2    3  aa  bb  cc
1  jan 2, 2000   d   e   f    4    5    6  dd  ee  ff
Sign up to request clarification or add additional context in comments.

2 Comments

In this case, wide_to_long is more convenient compared with stack/melt
I received error: ValueError: Duplicated level name: "ID", assigned to level 2, is already used for level 1.
4

Create MultiIndex in columns with split and reshape by DataFrame.stack by first level:

df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack(0).rename_axis(('date', 'ID')).reset_index()

print (df)
        date ID   1   2   3
0 2000-01-01  a   a   b   c
1 2000-01-01  b   1   2   3
2 2000-01-01  c  aa  bb  cc
3 2000-01-02  a   d   e   f
4 2000-01-02  b   4   5   6
5 2000-01-02  c  dd  ee  ff

Comments

0

One option is the pivot_longer function from pyjanitor, using the .value placeholder:

# pip install pyjanitor
import pandas as pd
import janitor 

df.pivot_longer(
    index = 'date', 
    names_to = ('ID', '.value'), 
    names_sep='_', 
    sort_by_appearance=True)

         date ID   1   2   3
0  jan 1 2000  a   a   b   c
1  jan 1 2000  b   1   2   3
2  jan 1 2000  c  aa  bb  cc
3  jan 2 2000  a   d   e   f
4  jan 2 2000  b   4   5   6
5  jan 2 2000  c  dd  ee  ff

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.