1

I have a dataframe with two colums of interest: a column with names (repeated for each course), and a column with the course name they've taken or started. The problem is, if I use the dataframe as-is, if a person has not attempted the course, they won't have a value showing for that course at all, and therefore I won't be able to tell who is missing the course. Is there a way to auto generate the missing rows with a name? The other columns can be N/A or anything else.

Dataframe:

Name Course Title Completion Date
Mark Math 1/1/2021
Mark English 2/1/2021
Mark Econ 3/1/2021
David Math 7/1/2021
David English 4/1/2021
Cindy Math 6/1/2021
Cindy English 9/1/2021
Cindy Econ 11/1/2021

What I'm looking for:

Name Course Title Completion Date
Mark Math 1/1/2021
Mark English 2/1/2021
Mark Econ 3/1/2021
David Math 7/1/2021
David English 4/1/2021
David Econ N/A
Cindy Math 6/1/2021
Cindy English 9/1/2021
Cindy Econ 11/1/2021

2 Answers 2

1

You can make missing values explicit with the complete function from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor
df.complete('Name', 'Course Title')

    Name Course Title Completion Date
0  Cindy         Econ       11/1/2021
1  Cindy      English        9/1/2021
2  Cindy         Math        6/1/2021
3  David         Econ             NaN
4  David      English        4/1/2021
5  David         Math        7/1/2021
6   Mark         Econ        3/1/2021
7   Mark      English        2/1/2021
8   Mark         Math        1/1/2021
Sign up to request clarification or add additional context in comments.

Comments

1

Try with pivot and stack with reset_index:

>>> df.pivot(*df).stack(dropna=False)[::-1].reset_index(name='Completion Date')
    Name Course Title Completion Date
0   Mark         Math        1/1/2021
1   Mark      English        2/1/2021
2   Mark         Econ        3/1/2021
3  David         Math        7/1/2021
4  David      English        4/1/2021
5  David         Econ             NaN
6  Cindy         Math        6/1/2021
7  Cindy      English        9/1/2021
8  Cindy         Econ       11/1/2021
>>> 

If you specifically want it to be N/A, try:

>>> df.pivot(*df).stack(dropna=False)[::-1].reset_index(name='Completion Date').fillna('N/A')
    Name Course Title Completion Date
0   Mark         Math        1/1/2021
1   Mark      English        2/1/2021
2   Mark         Econ        3/1/2021
3  David         Math        7/1/2021
4  David      English        4/1/2021
5  David         Econ             N/A
6  Cindy         Math        6/1/2021
7  Cindy      English        9/1/2021
8  Cindy         Econ       11/1/2021
>>> 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.