Python adding missing rows based on a single column

Question

I have a dataframe with two colums of interest: a column with names (repeated for each course), and a column with the course name they've taken or started. The problem is, if I use the dataframe as-is, if a person has not attempted the course, they won't have a value showing for that course at all, and therefore I won't be able to tell who is missing the course. Is there a way to auto generate the missing rows with a name? The other columns can be N/A or anything else.

Dataframe:

Name	Course Title	Completion Date
Mark	Math	1/1/2021
Mark	English	2/1/2021
Mark	Econ	3/1/2021
David	Math	7/1/2021
David	English	4/1/2021
Cindy	Math	6/1/2021
Cindy	English	9/1/2021
Cindy	Econ	11/1/2021

What I'm looking for:

Name	Course Title	Completion Date
Mark	Math	1/1/2021
Mark	English	2/1/2021
Mark	Econ	3/1/2021
David	Math	7/1/2021
David	English	4/1/2021
David	Econ	N/A
Cindy	Math	6/1/2021
Cindy	English	9/1/2021
Cindy	Econ	11/1/2021

sammywemmy · Accepted Answer · 2021-09-06 02:55:20Z

1

You can make missing values explicit with the complete function from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor
df.complete('Name', 'Course Title')

    Name Course Title Completion Date
0  Cindy         Econ       11/1/2021
1  Cindy      English        9/1/2021
2  Cindy         Math        6/1/2021
3  David         Econ             NaN
4  David      English        4/1/2021
5  David         Math        7/1/2021
6   Mark         Econ        3/1/2021
7   Mark      English        2/1/2021
8   Mark         Math        1/1/2021

answered Sep 6, 2021 at 2:55

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

U13-Forward · Accepted Answer · 2021-09-06 02:31:02Z

Try with pivot and stack with reset_index:

>>> df.pivot(*df).stack(dropna=False)[::-1].reset_index(name='Completion Date')
    Name Course Title Completion Date
0   Mark         Math        1/1/2021
1   Mark      English        2/1/2021
2   Mark         Econ        3/1/2021
3  David         Math        7/1/2021
4  David      English        4/1/2021
5  David         Econ             NaN
6  Cindy         Math        6/1/2021
7  Cindy      English        9/1/2021
8  Cindy         Econ       11/1/2021
>>>

If you specifically want it to be N/A, try:

>>> df.pivot(*df).stack(dropna=False)[::-1].reset_index(name='Completion Date').fillna('N/A')
    Name Course Title Completion Date
0   Mark         Math        1/1/2021
1   Mark      English        2/1/2021
2   Mark         Econ        3/1/2021
3  David         Math        7/1/2021
4  David      English        4/1/2021
5  David         Econ             N/A
6  Cindy         Math        6/1/2021
7  Cindy      English        9/1/2021
8  Cindy         Econ       11/1/2021
>>>

Collectives™ on Stack Overflow

Python adding missing rows based on a single column

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related