10

Given the following data frame:

pd.DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[6,7,8]})

   A   B   C
0  1   4   6
1  2   5   7
2  3   6   8
3  11  14  16
4  12  15  17
5  13  16  18

I would like to reshape it so it would look like so:

   A   B   C   A_1   B_1   C_1   A_2   B_2   C_2
0  1   4   6     2     5     7     3     6     8
1  11  14  16    12    15    17    13    16    18

So every 3 rows are grouped into 1 row

How can I achieve this with pandas?

5 Answers 5

12

One idea is create MultiIndex with integer and modulo division and reshape by DataFrame.unstack:

a = np.arange(len(df))
df.index = [a // 3, a % 3]
df = df.unstack().sort_index(axis=1, level=1)
df.columns = [f'{a}_{b}' for a, b in df.columns]
print (df)
   A_0  B_0  C_0  A_1  B_1  C_1  A_2  B_2  C_2
0    1    4    6    2    5    7    3    6    8
1   11   14   16   12   15   17   13   16   18

For reverse operation is possible use str.split with DataFrame.stack:

a = np.arange(len(df))
df1 = (df.set_index(pd.MultiIndex.from_arrays([a // 3, a % 3]))
         .unstack().sort_index(axis=1, level=1))
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
print (df1)
   A_0  B_0  C_0  A_1  B_1  C_1  A_2  B_2  C_2
0    1    4    6    2    5    7    3    6    8
1   11   14   16   12   15   17   13   16   18

df1.columns = df1.columns.str.split('_', expand=True)
df2 = df1.stack().reset_index(drop=True)
print (df2)
    A   B   C
0   1   4   6
1   2   5   7
2   3   6   8
3  11  14  16
4  12  15  17
5  13  16  18
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer, how would you revert the operation?
10
new = pd.concat([df[a::3].reset_index(drop=True) for a in range(3)], axis=1)
new.columns = ['{}_{}'.format(a,b) for b in range(3) for a in 'ABC']

Comments

3

You can try this:

pd.DataFrame(
    data=df.values.reshape([-1, df.values.shape[1]*3]),
    columns=list(df.columns) + sum([[c+'_'+str(i) for c in df.columns] for i in range(1, 3)], [])
)

Output for your input dataframe:


    A   B   C   A_1 B_1 C_1 A_2 B_2 C_2
0   1   4   6   2   5   7   3   6   8
1   11  14  16  12  15  17  13  16  18

Comments

3

Data is transformed every three rows : we can use numpy's reshape method to transform the data, and create a cartesian product of range(1,3) with the columns to get the new columns :

from itertools import product
row = len(df)//3

#create new columns
new_columns = df.columns.union(["_".join((letter,str(num))) 
                                for letter,num in product(df.columns,range(1,3))],
                               sort=False)

#create new dataframe
new_df = pd.DataFrame(np.reshape(df.to_numpy(),(row,-1)), 
                      columns=new_columns)
new_df

    A   B   C   A_1 A_2 B_1 B_2 C_1 C_2
0   1   4   6   2   5   7   3   6   8
1   11  14  16  12  15  17  13  16  18

Comments

2

We can group your data by repeating groups of n (in this case 3) and then use pd.concat to concat them together over the column axis:

n = 3
grps = df.groupby(df.index // n).cumcount()
dfn = pd.concat([d.reset_index(drop=True) for _, d in df.groupby(grps)], axis=1)
dfn.columns = [f'{col}_{i}' for col, i in zip(dfn.columns, np.arange(dfn.shape[1]) // n)]

   A_0  B_0  C_0  A_1  B_1  C_1  A_2  B_2  C_2
0    1    4    6    2    5    7    3    6    8
1   11   14   16   12   15   17   13   16   18

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.