Converting columns with date in names to separate rows in Python

Question

I already got answer to this question in R, wondering how this can be implemented in Python.

Let's say we have a pandas DataFrame like this:

import pandas as pd
d = pd.DataFrame({'2019Q1':[1], '2019Q2':[2], '2019Q3':[3]})

which displays like this:

   2019Q1  2019Q2  2019Q3
0       1       2       3

How can I transform it to looks like this:

Year    Quarter    Value
2019    1          1
2019    2          2
2019    3          3

jezrael · Accepted Answer · 2019-11-08 14:38:46Z

2

Use Series.str.split for MultiIndex with expand=True and then reshape by DataFrame.unstack, last data cleaning with with Series.reset_index and Series.rename_axis:

d = pd.DataFrame({'2019Q1':[1], '2019Q2':[2], '2019Q3':[3]})

d.columns = d.columns.str.split('Q', expand=True)
df = (d.unstack(0)
       .reset_index(level=2, drop=True)
       .rename_axis(('Year','Quarter'))
       .reset_index(name='Value'))
print (df)
   Year Quarter  Value
0  2019       1      1
1  2019       2      2
2  2019       3      3

Thank you @Jon Clements for another solution:

df = (d.melt()
       .variable
       .str.extract('(?P<Year>\d{4})Q(?P<Quarter>\d)')
       .assign(Value=d.T.values.flatten()))
print (df)
   Year Quarter  Value
0  2019       1      1
1  2019       2      2
2  2019       3      3

Alternative with split:

df = (d.melt()
       .variable
       .str.split('Q', expand=True)
       .rename(columns={0:'Year',1:'Quarter'})
       .assign(Value=d.T.values.flatten()))
print (df)
   Year Quarter  Value
0  2019       1      1
1  2019       2      2
2  2019       3      3

edited Nov 8, 2019 at 14:38

answered Nov 8, 2019 at 13:47

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jon Clements Over a year ago

Not entirely sure as it'd need testing, but something like: df.melt().variable.str.extract(r'(?P<Year>\d{4})Q(?P<Quarter>\d)').assign(Value=df.values.T) should do the trick...

Jon Clements Over a year ago

Okay... had a quick play with the above, and I reckon: df.melt().variable.str.extract('(?P<Year>\d{4})Q(?P<Quarter>\d)').assign(Value=df.values.flatten()) should do it :)

Jon Clements Over a year ago

(that way it scales to > 1 row)

jezrael Over a year ago

@JonClements - Thank you, also is necessary transpose before flatten - d = pd.DataFrame({'2019Q1':[1,8], '2019Q2':[2,9], '2019Q3':[3,0]})

Erfan · Accepted Answer · 2019-11-08 14:11:06Z

1

Using DataFrame.stack with DataFrame.pop and Series.str.split:

df = d.stack().reset_index(level=1).rename(columns={0:'Value'})
df[['Year', 'Quarter']] = df.pop('level_1').str.split('Q', expand=True)

   Value  Year Quarter
0      1  2019       1
0      2  2019       2
0      3  2019       3

If you care about the order of columns, use reindex:

df = df.reindex(['Year', 'Quarter', 'Value'], axis=1)

   Year Quarter  Value
0  2019       1      1
0  2019       2      2
0  2019       3      3

edited Nov 8, 2019 at 14:11

answered Nov 8, 2019 at 14:03

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Collectives™ on Stack Overflow

Converting columns with date in names to separate rows in Python

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related