Inserting default rows into Pandas Dataframe based on condition/missing data

Question

I have a dataframe that looks like this:

import pandas as pd

data = {'TABLE_NM': ['TABLE_A', 'TABLE_A', 'TABLE_A', 'TABLE_A',
                     'TABLE_B', 'TABLE_B', 'TABLE_B',
                     'TABLE_C', 'TABLE_C', 'TABLE_C', 'TABLE_C'
                     ],
        'TEST_TABLE_NM': ['TEST_TABLE_A', 'TEST_TABLE_A', 'TEST_TABLE_A', 'TEST_TABLE_A',
                     'TEST_TABLE_B', 'TEST_TABLE_B', 'TEST_TABLE_B',
                     'TEST_TABLE_C', 'TEST_TABLE_C', 'TEST_TABLE_C', 'TEST_TABLE_C'],
        'TYPE': ['TEST1', 'TEST2', 'TEST3', 'TEST4', 'TEST1', 'TEST2', 'TEST3',
                 'TEST1', 'TEST2', 'TEST3', 'TEST4'],
        'RESULTS': [1005,560,2000,2000,1005,560,2000,1005,560,135,55]
        }

df = pd.DataFrame(data, columns=['TABLE_NM', 'TEST_TABLE_NM', 'TYPE', 'RESULTS'])

Which results in this:

   TABLE_NM TEST_TABLE_NM   TYPE  RESULTS
0   TABLE_A  TEST_TABLE_A  TEST1     1005
1   TABLE_A  TEST_TABLE_A  TEST2      560
2   TABLE_A  TEST_TABLE_A  TEST3     2000
3   TABLE_A  TEST_TABLE_A  TEST4     2000
4   TABLE_B  TEST_TABLE_B  TEST1     1005
5   TABLE_B  TEST_TABLE_B  TEST2      560
6   TABLE_B  TEST_TABLE_B  TEST3     2000
7   TABLE_C  TEST_TABLE_C  TEST1     1005
8   TABLE_C  TEST_TABLE_C  TEST2      560
9   TABLE_C  TEST_TABLE_C  TEST3      135
10  TABLE_C  TEST_TABLE_C  TEST4       55

There are hundreds of TABLE_NM/TEST_TABLE_NM combinations in reality, each of them should be associated to 4 tests. Some however, only have 3 tests associated to them as you can see above with TABLE_B.
What I want to do is for every TABLE_NM AND TEST_TABLE_NM combo, if there is NO 'TEST4' listed, I want to insert a dummy row into the dataframe after the 'TEST3' row, which has 'TEST4' listed as 'Type' and 0 listed as the 'RESULT'. So the above dataframe would then look like this instead:

     TABLE_NM TEST_TABLE_NM   TYPE  RESULTS
0   TABLE_A  TEST_TABLE_A  TEST1     1005
1   TABLE_A  TEST_TABLE_A  TEST2      560
2   TABLE_A  TEST_TABLE_A  TEST3     2000
3   TABLE_A  TEST_TABLE_A  TEST4     2000
4   TABLE_B  TEST_TABLE_B  TEST1     1005
5   TABLE_B  TEST_TABLE_B  TEST2      560
6   TABLE_B  TEST_TABLE_B  TEST3     2000
7   TABLE_B  TEST_TABLE_B  TEST4        0
8   TABLE_C  TEST_TABLE_C  TEST1     1005
9   TABLE_C  TEST_TABLE_C  TEST2      560
10  TABLE_C  TEST_TABLE_C  TEST3      135
11  TABLE_C  TEST_TABLE_C  TEST4       55

Any ideas on how this could be achieved?

G. Anderson · Accepted Answer · 2019-02-12 20:18:53Z

3

You can chain pivot table to get all columns with all rows, fillna to fill zeros for missing data, stack to get the columns back to rows, and reset the index (you can skip this step to get a multiindex of table/test_table)

df=df.pivot_table(index=['TABLE_NM','TEST_TABLE_NM'], columns=['TYPE']).fillna(0).stack().reset_index()

    TABLE_NM    TEST_TABLE_NM   TYPE    RESULTS
0   TABLE_A     TEST_TABLE_A    TEST1   1005.0
1   TABLE_A     TEST_TABLE_A    TEST2   560.0
2   TABLE_A     TEST_TABLE_A    TEST3   2000.0
3   TABLE_A     TEST_TABLE_A    TEST4   2000.0
4   TABLE_B     TEST_TABLE_B    TEST1   1005.0
5   TABLE_B     TEST_TABLE_B    TEST2   560.0
6   TABLE_B     TEST_TABLE_B    TEST3   2000.0
7   TABLE_B     TEST_TABLE_B    TEST4   0.0
8   TABLE_C     TEST_TABLE_C    TEST1   1005.0
9   TABLE_C     TEST_TABLE_C    TEST2   560.0
10  TABLE_C     TEST_TABLE_C    TEST3   135.0
11  TABLE_C     TEST_TABLE_C    TEST4   55.0

If you want to see it in action, I would recommend doing each operation one at a time and viewing the output in between each step:

df=df.pivot_table(index=['TABLE_NM','TEST_TABLE_NM'], columns=['TYPE'])

df=df.fillna(0)

df=df.stack()

df=df.reset_index()

edited Feb 12, 2019 at 20:18

answered Feb 12, 2019 at 20:05

G. Anderson

5,9652 gold badges16 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

JD2775 Over a year ago

thank you very much. Just so I understand this code clearly, how does it know to fill that cell in with 'Test4'? I can see the fillna(0) takes care of the 0 part, but I can't understand how the 'Test4' gets populated. It does work though, thank you

G. Anderson Over a year ago

Good question, see my edit about doing each step one at a time. Basically the magic is in the pivot_table, which converts each unique value in TYPE into a separate column, and fills any missing values with NaN

Collectives™ on Stack Overflow

Inserting default rows into Pandas Dataframe based on condition/missing data

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related