convert rows to columns in pandas dataframe

Question

I'd like to convert to below Df1 to Df2.

The empty values would be filled with Nan.

Below Dfs are examples.

My data has weeks from 1 to 8.
IDs are 100,000.
Only week 8 has all IDs, so total rows will be 100,000.

I have Df3 which has 100,000 of id, and I want to merge df1 on Df3 formatted as df2.

ex) pd.merge(df3, df1, on="id", how="left") -> but, formatted as df2

 Df1>
 wk, id, col1, col2  ...
 1    1   0.5  15  
 2    2   0.5  15  
 3    3   0.5  15  
 1    2   0.5  15  
 3    2   0.5  15  

 ------
 Df2>
 wk1, id, col1, col2, wk2, id, col1, col2, wk3,  id, col1, col2,...
 1    1   0.5  15      2    1   Nan   Nan   3    1   Nan   Nan
 1    2   0.5  15      2    2   0.5  15     3    2   0.5    15
 1    3   Nan  Nan     2    3   Nan   Nan   3    3   0.5    15

@Hamidreza I don't think that what I asked is about transposing. If so, could you tell me how it works? — Ken Kim
– Ken Kim, Commented Aug 1, 2018 at 6:14
You see, I need al least a small portion (example) of your code there, please do it. By the way, your wk is not the DF index, so it means you need kind of filtering based on one weeks, right? — Shivid
– Shivid, Commented Aug 1, 2018 at 6:19
@Hamidreza Yes, you're right. It's needed to be filtered by wk and added by columns. Wk is not the index. I don't have this code. Above Df1 is an example to explain. If you need bigger one or code about the Df1, I will happily write down. — Ken Kim
– Ken Kim, Commented Aug 1, 2018 at 6:25

jezrael · Accepted Answer · 2018-08-01 08:17:14Z

Use:

#create dictionary for rename columns for correct sorting
d = dict(enumerate(df.columns))
d1 = {v:k for k, v in d.items()}

#first add missing values for each `wk` and `id`
df1 = df.set_index(['wk', 'id']).unstack().stack(dropna=False).reset_index()

#for each id create DataFrame, reshape by unstask and rename columns
df1 = (df1.groupby('id')
       .apply(lambda x: pd.DataFrame(x.values, columns=df.columns))
       .unstack()
       .reset_index(drop=True)
       .rename(columns=d1, level=0)
       .sort_index(axis=1, level=1)
       .rename(columns=d, level=0))

#convert values to integers if necessary
df1.loc[:, ['wk', 'id']] = df1.loc[:, ['wk', 'id']].astype(int)

#flatten MultiIndex in columns
df1.columns = ['{}_{}'.format(a, b) for a, b in df1.columns]
print (df1)

   wk_0  id_0  col1_0  col2_0  wk_1  id_1  col1_1  col2_1  wk_2  id_2  col1_2  \
0     1     1     0.5    15.0     2     1     NaN     NaN     3     1     NaN   
1     1     2     0.5    15.0     2     2     0.5    15.0     3     2     0.5   
2     1     3     NaN     NaN     2     3     NaN     NaN     3     3     0.5   

   col2_2  
0     NaN  
1    15.0  
2    15.0

jpp · Accepted Answer · 2018-08-01 08:44:56Z

You can use GroupBy + concat. The idea is to create a list of dataframes with appropriately named columns and appropriate index. The concatenate along axis=1:

d = {k: v.reset_index(drop=True) for k, v in df.groupby('wk')}

def formatter(df, key):
    return df.rename(columns={'w': f'wk{key}'}).set_index('id')

L = [formatter(df, key) for key, df in d.items()]
res = pd.concat(L, axis=1).reset_index()

print(res)

   id   wk  col1  col2   wk  col1  col2   wk  col1  col2
0   1  1.0   0.5  15.0  NaN   NaN   NaN  NaN   NaN   NaN
1   2  1.0   0.5  15.0  2.0   0.5  15.0  3.0   0.5  15.0
2   3  NaN   NaN   NaN  NaN   NaN   NaN  3.0   0.5  15.0

Note NaN forces your series to become float. There's no "good" fix for this.

Collectives™ on Stack Overflow

convert rows to columns in pandas dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related