Convert one column into multiple rows using python

Question

I have a dataset 'df' with 3 columns.

>> Original Data

    Student Id    Name  Marks
0       id_1    John    112
1       id_2    Rafs    181
2       id_2    Rafs    182
3       id_2    Rafs    183
4       id_3    Juan    222
5       id_3    Juan    312
6       id_3  Roller     21

Trying to keep the columns 'Student_Id' and 'Name' as it is but convert 'Marks' to multiple columns. Such that each unique 'Student_Id' and 'Name' will have a row of all possible Marks. Also we need not create columns manually but it should be dynamically created depending on the values.

>> Expected Output

    Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John     112    <NA>    <NA>
1       id_2    Rafs     181     182     183
2       id_3    Juan     222     312    <NA>
3       id_3  Roller      21    <NA>    <NA>

Sample data to replicate the input

import pandas as pd

data = [
    ["id_1", 'John', 112],
    ["id_2", 'Rafs', 181],
    ["id_2", 'Rafs', 182],
    ["id_2", 'Rafs', 183], 
    ["id_3", 'Juan', 222],
    ["id_3", 'Juan', 312],
    ["id_3", 'Roller', 21]
]
df = pd.DataFrame(data, columns = ['Student Id', 'Name', 'Marks'])

I tried the below but I am not getting the desired output. It gives results in brackets() an also the Marks is missing.

df3 = df.pivot_table(index=['Student Id','Name'], columns='Marks', aggfunc = 'max')

>>Output
Empty DataFrame
Columns: []
Index: [(id_1, John), (id_2, Rafs), (id_3, Juan), (id_3, Roller)]

jezrael · Accepted Answer · 2021-05-25 11:56:22Z

Use GroupBy.cumcount for new column for counter columns created MultiIndex in df3:

df['g'] = df.groupby(['Student Id','Name']).cumcount().add(1)

df3 = (df.pivot_table(index=['Student Id','Name'], 
                     columns='g', 
                     values='Marks', 
                     aggfunc = 'max')
        .add_prefix('Marks')
        .rename_axis(None, axis=1)
        .reset_index())
print (df3)
  Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John   112.0     NaN     NaN
1       id_2    Rafs   181.0   182.0   183.0
2       id_3    Juan   222.0   312.0     NaN
3       id_3  Roller    21.0     NaN     NaN

If need integers with missing values:

df['g'] = df.groupby(['Student Id','Name']).cumcount().add(1)

df3 = (df.pivot_table(index=['Student Id','Name'], 
                     columns='g', 
                     values='Marks', 
                     aggfunc = 'max')
        .add_prefix('Marks')
        .astype('Int64')
        .rename_axis(None, axis=1)
        .reset_index())
print (df3)
  Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John     112    <NA>    <NA>
1       id_2    Rafs     181     182     183
2       id_3    Juan     222     312    <NA>
3       id_3  Roller      21    <NA>    <NA>

Mustafa Aydın · Accepted Answer · 2021-05-25 12:07:11Z

2

another way:

temp = df.groupby(["Student Id", "Name"]).Marks.agg(list)

out = (pd.DataFrame(temp.tolist(), index=temp.index)
           .rename(columns=lambda x: f"Marks{x+1}")
           .reset_index())

temp will be a dataframe with aggregated lists of Marks per (id, name). Then we form a dataframe out of it while renaming the columns to desired format and resetting the index to put id & name to columns back.

to get

  Student Id    Name  Marks1  Marks2  Marks3
0       id_1    John     112     NaN     NaN
1       id_2    Rafs     181   182.0   183.0
2       id_3    Juan     222   312.0     NaN
3       id_3  Roller      21     NaN     NaN

edited May 25, 2021 at 12:07

answered May 25, 2021 at 12:02

Mustafa Aydın

18.4k4 gold badges21 silver badges43 bronze badges

Comments

paradocslover · Accepted Answer · 2021-05-25 12:08:00Z

Here is a much easy to understand answer without creating an extra column

#Grouping by Studend Id and Name
new_df = df.groupby(['Student Id','Name'])['Marks'].apply(list).reset_index()

#Now, in the marks columns, we have a list (as seen below).
#So, we convert the list into different columns, using pd.Series

#   Student Id  Name    Marks
# 0 id_1    John    [112]
# 1 id_2    Rafs    [181, 182, 183]
# 2 id_3    Juan    [222, 312]
# 3 id_4    Roller  [21]
temp_df = new_df['Marks'].apply(pd.Series)

#Now, this is all decorative stuff. 

#Converting the column names from 0,1,2 to Marks1, Marks2, Marks3
temp_df.columns = list(map(lambda x: 'Marks'+str(x+1), temp_df.columns))

# Assigning this new temporary df to the original df
new_df[temp_df.columns] = temp_df

#Dropping the Marks column
final_df = new_df.drop('Marks',axis=1)
print(final_df)

Output:

    Student Id  Name    Marks1  Marks2  Marks3
0   id_1    John    112.0   NaN NaN
1   id_2    Rafs    181.0   182.0   183.0
2   id_3    Juan    222.0   312.0   NaN
3   id_4    Roller  21.0    NaN NaN

Collectives™ on Stack Overflow

Convert one column into multiple rows using python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related