3

I have a dataframe like this:

          index        A   B   C                  
     01.01.2000        a1  b1  c1
     01.02.2000        a2  b2  c2
     01.03.2000        a3  b3  c3

and would like to get this:

              index      X
     (0, 01.01.2000)     a1
     (0, 01.02.2000)     a2
     (0, 01.03.2000)     a3
     (1, 01.01.2000)     b1
     (1, 01.02.2000)     b2
     (1, 01.03.2000)     b3
     (2, 01.01.2000)     c1
     (2, 01.02.2000)     c2
     (2, 01.03.2000)     c3

I need it this way to run the data through some regression models. The pandas docs on multi indexing unfortunately are very confusing to me as I'm new to this. Thank you!

4 Answers 4

3

You can try of getting the categorical codes of columns, followed by stacking and converging index to tuple

df.columns = df.columns.to_series().astype('category').cat.codes
df1 = df.stack().reorder_levels([1,0]).sort_index()
df1.index = tuple(df1.index)

Out:

(0, 01.01.2000)    a1
(0, 01.02.2000)    a2
(0, 01.03.2000)    a3
(1, 01.01.2000)    b1
(1, 01.02.2000)    b2
(1, 01.03.2000)    b3
(2, 01.01.2000)    c1
(2, 01.02.2000)    c2
(2, 01.03.2000)    c3
dtype: object

** Edit **

Sorting the data based on index levels

df.index  = pd.to_datetime(df.index)
df.columns = df.columns.to_series().astype('category').cat.codes
df1 = df.stack().reorder_levels([1,0]).sort_index(level=1)
df1.index = tuple(df1.index)

Out:

(0, 2000-01-01 00:00:00)    a1
(1, 2000-01-01 00:00:00)    b1
(2, 2000-01-01 00:00:00)    c1
(0, 2000-01-02 00:00:00)    a2
(1, 2000-01-02 00:00:00)    b2
(2, 2000-01-02 00:00:00)    c2
(0, 2000-01-03 00:00:00)    a3
(1, 2000-01-03 00:00:00)    b3
(2, 2000-01-03 00:00:00)    c3
dtype: object
Sign up to request clarification or add additional context in comments.

4 Comments

I like this one too
This works great except i need the output sorted as in the other examples ( all values from column A followed by values from column B
Thanks, one step closer:) the data is now sorted by the first argument of the multi index, however its not sorted by the second one(date), do you know how to fix this? ( for example in the output series all the a values are not ordered by date)
Sort index actually sorts using both index levels, if you want to sort on individual levels, you can specify the level on which the frame should be sorted :-) @MicheleNg
3

You need reset_index two times then just doing melt

s=df.reset_index().reset_index().melt(['level_0','index'])
yourdf=pd.DataFrame({'index':tuple(zip(s['level_0'],s['index'])),'X':s.value})
yourdf
Out[130]: 
             index   X
0  (0, 01.01.2000)  a1
1  (1, 01.02.2000)  a2
2  (2, 01.03.2000)  a3
3  (0, 01.01.2000)  b1
4  (1, 01.02.2000)  b2
5  (2, 01.03.2000)  b3
6  (0, 01.01.2000)  c1
7  (1, 01.02.2000)  c2
8  (2, 01.03.2000)  c3

Comments

3

Comprehension

pd.DataFrame([
    [(i, idx), v]
    for i, (idx, *V) in enumerate(df.itertuples())
    for v in V
], columns=['index', 'X'])

             index   X
0  (0, 01.01.2000)  a1
1  (0, 01.01.2000)  b1
2  (0, 01.01.2000)  c1
3  (1, 01.02.2000)  a2
4  (1, 01.02.2000)  b2
5  (1, 01.02.2000)  c2
6  (2, 01.03.2000)  a3
7  (2, 01.03.2000)  b3
8  (2, 01.03.2000)  c3

Comments

2

Slightly different way,

new_df = df.set_index('index', append=True).stack().droplevel(2)
new_df.index = tuple(zip(new_df.index))
new_df = new_df.reset_index().rename(columns = {'level_0': 'index', 0:'X'})

    index           X
0   (0, 01.01.2000) a1
1   (0, 01.01.2000) b1
2   (0, 01.01.2000) c1
3   (1, 01.02.2000) a2
4   (1, 01.02.2000) b2
5   (1, 01.02.2000) c2
6   (2, 01.03.2000) a3
7   (2, 01.03.2000) b3
8   (2, 01.03.2000) c3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.