pandas multi Index from columns

Question

I have a dataframe like this:

          index        A   B   C                  
     01.01.2000        a1  b1  c1
     01.02.2000        a2  b2  c2
     01.03.2000        a3  b3  c3

and would like to get this:

              index      X
     (0, 01.01.2000)     a1
     (0, 01.02.2000)     a2
     (0, 01.03.2000)     a3
     (1, 01.01.2000)     b1
     (1, 01.02.2000)     b2
     (1, 01.03.2000)     b3
     (2, 01.01.2000)     c1
     (2, 01.02.2000)     c2
     (2, 01.03.2000)     c3

I need it this way to run the data through some regression models. The pandas docs on multi indexing unfortunately are very confusing to me as I'm new to this. Thank you!

Naga kiran · Accepted Answer · 2019-02-14 17:49:49Z

3

You can try of getting the categorical codes of columns, followed by stacking and converging index to tuple

df.columns = df.columns.to_series().astype('category').cat.codes
df1 = df.stack().reorder_levels([1,0]).sort_index()
df1.index = tuple(df1.index)

Out:

(0, 01.01.2000)    a1
(0, 01.02.2000)    a2
(0, 01.03.2000)    a3
(1, 01.01.2000)    b1
(1, 01.02.2000)    b2
(1, 01.03.2000)    b3
(2, 01.01.2000)    c1
(2, 01.02.2000)    c2
(2, 01.03.2000)    c3
dtype: object

** Edit **

Sorting the data based on index levels

df.index  = pd.to_datetime(df.index)
df.columns = df.columns.to_series().astype('category').cat.codes
df1 = df.stack().reorder_levels([1,0]).sort_index(level=1)
df1.index = tuple(df1.index)

Out:

(0, 2000-01-01 00:00:00)    a1
(1, 2000-01-01 00:00:00)    b1
(2, 2000-01-01 00:00:00)    c1
(0, 2000-01-02 00:00:00)    a2
(1, 2000-01-02 00:00:00)    b2
(2, 2000-01-02 00:00:00)    c2
(0, 2000-01-03 00:00:00)    a3
(1, 2000-01-03 00:00:00)    b3
(2, 2000-01-03 00:00:00)    c3
dtype: object

edited Feb 14, 2019 at 17:49

answered Feb 14, 2019 at 16:52

Naga kiran

4,6071 gold badge21 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

piRSquared Over a year ago

I like this one too

Michele Ng Over a year ago

This works great except i need the output sorted as in the other examples ( all values from column A followed by values from column B

Michele Ng Over a year ago

Thanks, one step closer:) the data is now sorted by the first argument of the multi index, however its not sorted by the second one(date), do you know how to fix this? ( for example in the output series all the a values are not ordered by date)

Naga kiran Over a year ago

Sort index actually sorts using both index levels, if you want to sort on individual levels, you can specify the level on which the frame should be sorted :-) @MicheleNg

BENY · Accepted Answer · 2019-02-14 16:26:21Z

3

You need reset_index two times then just doing melt

s=df.reset_index().reset_index().melt(['level_0','index'])
yourdf=pd.DataFrame({'index':tuple(zip(s['level_0'],s['index'])),'X':s.value})
yourdf
Out[130]: 
             index   X
0  (0, 01.01.2000)  a1
1  (1, 01.02.2000)  a2
2  (2, 01.03.2000)  a3
3  (0, 01.01.2000)  b1
4  (1, 01.02.2000)  b2
5  (2, 01.03.2000)  b3
6  (0, 01.01.2000)  c1
7  (1, 01.02.2000)  c2
8  (2, 01.03.2000)  c3

answered Feb 14, 2019 at 16:26

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

piRSquared · Accepted Answer · 2019-02-14 16:37:32Z

3

Comprehension

pd.DataFrame([
    [(i, idx), v]
    for i, (idx, *V) in enumerate(df.itertuples())
    for v in V
], columns=['index', 'X'])

             index   X
0  (0, 01.01.2000)  a1
1  (0, 01.01.2000)  b1
2  (0, 01.01.2000)  c1
3  (1, 01.02.2000)  a2
4  (1, 01.02.2000)  b2
5  (1, 01.02.2000)  c2
6  (2, 01.03.2000)  a3
7  (2, 01.03.2000)  b3
8  (2, 01.03.2000)  c3

edited Feb 14, 2019 at 16:37

answered Feb 14, 2019 at 16:30

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Comments

Vaishali · Accepted Answer · 2019-02-14 16:40:09Z

2

Slightly different way,

new_df = df.set_index('index', append=True).stack().droplevel(2)
new_df.index = tuple(zip(new_df.index))
new_df = new_df.reset_index().rename(columns = {'level_0': 'index', 0:'X'})

    index           X
0   (0, 01.01.2000) a1
1   (0, 01.01.2000) b1
2   (0, 01.01.2000) c1
3   (1, 01.02.2000) a2
4   (1, 01.02.2000) b2
5   (1, 01.02.2000) c2
6   (2, 01.03.2000) a3
7   (2, 01.03.2000) b3
8   (2, 01.03.2000) c3

answered Feb 14, 2019 at 16:40

Vaishali

38.5k5 gold badges62 silver badges88 bronze badges

Collectives™ on Stack Overflow

pandas multi Index from columns

4 Answers 4

4 Comments

Comments

Comprehension

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

Comments

Comprehension

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related