Convert one level mixed header dataframe to vertical dataframe in Pandas

Question

The following dataframe have multiple column names with format item:district:

   date  price:dc  price:xc  price:cy  ratio:dc  ratio:xc  ratio:cy
0  2017        12        11        14       0.1       0.1       0.3
1  2018        14        12        15       0.2       0.7       0.6
2  2019        13        13        16       0.5      -0.2       0.8

Is it possible to convert it to a new dataframe as follows? Thanks.

   date district  price  ratio
0  2017       dc     12    0.1
1  2018       dc     14    0.2
2  2019       dc     13    0.5
3  2017       xc     11    0.1
4  2018       xc     12    0.7
5  2019       xc     13   -0.2
6  2017       cy     14    0.3
7  2018       cy     15    0.6
8  2019       cy     16    0.8

jezrael · Accepted Answer · 2019-12-13 08:41:47Z

1

You can create MultiIndex with columns with : by str.split with created index by non : columns before by DataFrame.set_index and then reshape by DataFrame.stack:

df = df.set_index('date')
df.columns = df.columns.str.split(':', expand=True)
df = df.stack().rename_axis(('date','district')).reset_index()
print (df)
   date district  price  ratio
0  2017       cy     14    0.3
1  2017       dc     12    0.1
2  2017       xc     11    0.1
3  2018       cy     15    0.6
4  2018       dc     14    0.2
5  2018       xc     12    0.7
6  2019       cy     16    0.8
7  2019       dc     13    0.5
8  2019       xc     13   -0.2

If ordering is important one solution is create ordered categoricals:

df = df.set_index('date')
df.columns = df.columns.str.split(':', expand=True)

lvl = pd.CategoricalIndex(df.columns.levels[1], 
                          ordered=True, 
                          categories=df.columns.get_level_values(1).drop_duplicates())
df.columns = df.columns.set_levels(lvl, level=1)

df = df.stack().sort_index(level=[1,0]).rename_axis(('date','district')).reset_index()
print (df)
   date district  price  ratio
0  2017       dc     12    0.1
1  2018       dc     14    0.2
2  2019       dc     13    0.5
3  2017       xc     11    0.1
4  2018       xc     12    0.7
5  2019       xc     13   -0.2
6  2017       cy     14    0.3
7  2018       cy     15    0.6
8  2019       cy     16    0.8

edited Dec 13, 2019 at 8:41

answered Dec 13, 2019 at 8:32

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

ah bon Over a year ago

Many thanks. If headers is split by multiple :s such as a:price:xc:b, a:price:cy:b:c, a:price:xc:b:c:d, I need to get the values after first and second :, at this case price:xc, price:cy, price:dc as df.columns, etc, how can I do that?

jezrael Over a year ago

@ahbon - There is always same value a before first : for price column?

jezrael Over a year ago

@ahbon - Can you test df.columns = df.columns.str.replace(r'a:price', 'price').str.split(':', n=2, expand=True).droplevel(-1) ?

ah bon Over a year ago

I use df.columns = df.columns.str.replace("a:", ""), then apply df.columns = df.columns.str.split(':', n=2, expand=True).droplevel(-1), works perfectly. Great thanks. :) You're genius.

jezrael Over a year ago

@ahbon - Because it create also 3rd level by :b:c:d values, so droplevel(-1) remove it

|

Collectives™ on Stack Overflow

Convert one level mixed header dataframe to vertical dataframe in Pandas

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related