Python - Pandas, split long column to multiple columns

Question

Given the following DataFrame:

>>> pd.DataFrame(data=[['a',1],['a',2],['b',3],['b',4],['c',5],['c',6],['d',7],['d',8],['d',9],['e',10]],columns=['key','value'])
  key  value
0   a      1
1   a      2
2   b      3
3   b      4
4   c      5
5   c      6
6   d      7
7   d      8
8   d      9
9   e     10

I'm looking for a method that will change the structure based on the key value, like so:

   a  b  c  d   e
0  1  3  5  7  10
1  2  4  6  8  10 <- 10 is duplicated
2  2  4  6  9  10 <- 10 is duplicated

The result row number is as the longest group count (d in the above example) and the missing values are duplicates of the last available value.

jezrael · Accepted Answer · 2018-11-28 14:53:46Z

5

Create MultiIndex by set_index with counter column by cumcount, reshape by unstack, repalce missing values by last non missing ones with ffill and last converting all data to integers if necessary:

df = df.set_index([df.groupby('key').cumcount(),'key'])['value'].unstack().ffill().astype(int)

Another solution with custom lambda function:

df = (df.groupby('key')['value']
        .apply(lambda x: pd.Series(x.values))
        .unstack(0)
        .ffill()
        .astype(int))

print (df)
key  a  b  c  d   e
0    1  3  5  7  10
1    2  4  6  8  10
2    2  4  6  9  10

edited Nov 28, 2018 at 14:53

answered Nov 28, 2018 at 14:47

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2018-11-28 14:54:38Z

2

Using pivot , with groupby + cumcount

df.assign(key2=df.groupby('key').cumcount()).pivot('key2','key','value').ffill().astype(int)
Out[214]: 
key   a  b  c  d   e
key2                
0     1  3  5  7  10
1     2  4  6  8  10
2     2  4  6  9  10

edited Nov 28, 2018 at 14:54

answered Nov 28, 2018 at 14:53

BENY

324k22 gold badges176 silver badges250 bronze badges

3 Comments

Shlomi Schwartz Over a year ago

What is the key2 for, and how can I remove it?

BENY Over a year ago

@ShlomiSchwartz key2 is the index name remove it adding rename_axisdf.assign(key2=df.groupby('key').cumcount()).pivot('key2','key','value').ffill().astype(int).rename_axis(None)

Shlomi Schwartz Over a year ago

Thanks for the explanation :)

Collectives™ on Stack Overflow

Python - Pandas, split long column to multiple columns

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related