6

I have the following dataframe:

pp  b          pp   b
5   0.001464    6   0.001853
5   0.001459    6   0.001843

Is there a way to unpivot columns with the same name into multiple rows?

This is the required output:

pp  b         
5   0.001464    
5   0.001459    
6   0.001853
6   0.001843

6 Answers 6

10

Try groupby with axis=1

df.groupby(df.columns.values, axis=1).agg(lambda x: x.values.tolist()).sum().apply(pd.Series).T.sort_values('pp')
Out[320]: 
          b   pp
0  0.001464  5.0
2  0.001459  5.0
1  0.001853  6.0
3  0.001843  6.0

A fun way with wide_to_long

s=pd.Series(df.columns)
df.columns=df.columns+s.groupby(s).cumcount().astype(str)

pd.wide_to_long(df.reset_index(),stubnames=['pp','b'],i='index',j='drop',suffix='\d+')
Out[342]: 
            pp         b
index drop              
0     0      5  0.001464
1     0      5  0.001459
0     1      6  0.001853
1     1      6  0.001843
Sign up to request clarification or add additional context in comments.

2 Comments

thanks @Wen, your soln works. can you tell me what is the groupby and agg part doing? thanks!
@user308827 that part is groupby the columns , same column we concat the value into a list , then we juts need to flatten the list , we yield the result
4

This is possible using numpy:

res = pd.DataFrame({'pp': df['pp'].values.T.ravel(),
                    'b': df['b'].values.T.ravel()})

print(res)

          b  pp
0  0.001464   5
1  0.001459   5
2  0.001853   6
3  0.001843   6

Or without referencing specific columns explicitly:

res = pd.DataFrame({i: df[i].values.T.ravel() for i in set(df.columns)})

Comments

3

Let's use melt, cumcount and unstack:

dm = df.melt()
dm.set_index(['variable',dm.groupby('variable').cumcount()])\
  .sort_index()['value'].unstack(0)

Output:

variable         b   pp
0         0.001464  5.0
1         0.001459  5.0
2         0.001853  6.0
3         0.001843  6.0

1 Comment

thanks! I get this error: *** TypeError: '<' not supported between instances of 'str' and 'int', not sure yet if this is because the sample dataframe is different from my actual dataframe or something else
2

I'm a little bit surprise that nobody has mentioned so far the use of pd.concat... Take a look below:

df1 = pd.DataFrame({'Col1':[1,2,3,4], 'Col2':[5,6,7,8]})
df1
      Col1  Col2
   0     1     5
   1     2     6
   2     3     7
   3     4     8 

Now if you make:

   df2 = pd.concat([df1,df1])

you get:

   Col1  Col2
0     1     5
1     2     6
2     3     7
3     4     8
0     1     5
1     2     6
2     3     7
3     4     8

This is what you wanted, isn't?

Comments

0

if you know the number of repetitions in ahead, it's very easy with using numpy:

import numpy as np
import pandas as pd

repetitions=5
rows=2
original_columns=list('ab')

df=pd.DataFrame(np.random.randint(0,10,[rows,len(original_columns)*repetitions]), columns=original_columns*repetitions)
display(df)

    a   b   a   b   a   b   a   b   a   b
0   6   4   7   5   2   5   3   1   4   3
1   1   5   4   9   6   2   9   5   3   6

# now the interesting part:
df=pd.concat(np.hsplit(df, repetitions))
display(df)


    a   b
0   6   4
1   1   5
0   7   5
1   4   9
0   2   5
1   6   2
0   3   1
1   9   5
0   4   3
1   3   6

Comments

0

One option is with pivot_longer from pyjanitor - in this case we take advantage of the fact that pp is followed by b - we can safely pair them and reshape into two columns.

# pip install pyjanitor
import pandas as pd
import janitor

arr = ['pp', 'b']
df.pivot_longer(index = None, names_to = arr, names_pattern = arr)
   pp         b
0   5  0.001464
1   5  0.001459
2   6  0.001853
3   6  0.001843

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.