1

I wanted to know if it's possible to stack columns values from the same data frame with almost the same name. I have the following data frame

import pandas as pd

data = {'text':['hello','hi'],
        'a':[1,2,],
        'b':[2,1,],
        'a.1':[3,4],
        'b.1':[4,3]
        }

I have multiple a. and b. so it goes to a.N and b.N but the end result has to be like the below data frame.

data2 ={'text':['hello','hi','hello','hi'],'identifier':[0,0,1,1],
        'a':[1,2,3,4],
        'b':[2,1,4,3],
        }

the identifier column is just to know how it was stacked for instance the first 2 values 0,0 came from the original column and 1,1 came from a.1 and b.1. I hope it all makes sense.

1
  • Are you manipulating dataframes or dictionaries? Commented Mar 17, 2021 at 18:30

2 Answers 2

1

This is similar to pd.wide_to_long except that you don't have the prefix for the first set.

Try with a custom rename function, then unstack:

def rename_col(x):
    out = x.split('.')
    return (x,'0') if len(out)==1 else tuple(out)

df = df.set_index('text')
df.columns=df.columns.map(rename_col)

df.stack(level=1).reset_index()

Output:

    text level_1  a  b
0  hello       0  1  2
1  hello       1  3  4
2     hi       0  2  1
3     hi       1  4  3

Update Or you can use pd.wide_to_long with another rename function:

def rename_col(x): return x if x=='text' or '.' in x else x+'.0'

pd.wide_to_long(df.rename(columns=rename_col),
                i='text', j='identifier',
                stubnames=['a','b'],
                sep='.'
               )

Output:

                  a  b
text  identifier      
hello 0           1  2
hi    0           2  1
hello 1           3  4
hi    1           4  3
Sign up to request clarification or add additional context in comments.

1 Comment

Hello! the first option worked flawlessly about the second option I had the error stubname can't be identical to a column name but thank you it worked as I wanted
1

You can create the identifier , however here is a way with groupby on axis=1

u = df.set_index("text")
out = pd.concat([g.stack().droplevel(-1) for _,g in 
                 u.groupby(u.columns.str.split('.').str[0],axis=1)],axis=1,keys=u)

print(out)

       a  b
text       
hello  1  2
hello  3  4
hi     2  1
hi     4  3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.