Adding new rows with default value based on dataframe values into dataframe

Question

I have data with a large number of columns:

df:
ID   col1   col2   col3 ... col100
1    a      x      0        1  
1    a      x      1        1
2    a      y      1        1
4    a      z      1        0  
...
98   a      z      1        1
100  a      x      1        0

I want to fill in the missing ID values with a default value that indicate that the data is missing here. For example here it would be ID 3 and hypothetically speaking lets say the missing row data looks like ID 100

ID   col1   col2  col3 ... col100
3    a      x     1        0
99   a      x     1        0

Expected output:

df:
ID   col1   col2   col3 ... col100
1    a      x      0        1  
1    a      x      1        1
2    a      y      1        1
3    a      x      1        0
4    a      z      1        0  
...
98   a      z      1        1
99   a      x     1        0
100  a      x      1        0

I'm also ok with the 3 and 99 being at the bottom.

I have tried several ways of appending new rows:

noresponse = df[filterfornoresponse].head(1).copy() #assume that this will net us row 100

for i in range (1, maxID):
    if len(df[df['ID'] == i) == 0: #IDs with no rows ie missing data
        temp = noresponse.copy()
        temp['ID'] = i
        df.append(temp, ignore_index = True)

This method doesn't seem to append anything.

I have also tried

pd.concat([df, temp], ignore_index = True)

instead of df.append

I have also tried adding the rows to a list noresponserows with the intention of concating the list with df:

noresponserows = []

for i in range (1, maxID):
    if len(df[df['ID'] == i) == 0: #IDs with no rows ie missing data
        temp = noresponse.copy()
        temp['ID'] = i
        noresponserows.append(temp)

But here the list always ends up with only 1 row when in my data I know there are more than one rows that need to be appended.

I'm not sure why I am having trouble appending more than once instance of noresponse into the list, and why I can't directly append to a dataframe. I feel like I am missing something here.

I think it might have to do with me taking a copy of a row in the df vs constructing a new one. The reason why I take a copy of a row to get noresponse is because there are a large amount of columns so it is easier to just take an existing row.

You need to do df.append(..., inplace=True) or df = df.append(...) because df.append() returns a new dataframe by default. — user17242583
– user17242583, Commented Dec 18, 2021 at 23:42
Will you please provide a sample dataframe containing your expected output? — user17242583
– user17242583, Commented Dec 18, 2021 at 23:44
added expected output @richardec, what I mean is that I want to have Ids span all contiguous values from 1 to 100 so here 3 and 99 are missing and I want to add them — SemicolonExpected
– SemicolonExpected, Commented Dec 18, 2021 at 23:52

user17242583 · Accepted Answer · 2021-12-18 23:52:48Z

1

Say you have a dataframe like this:

>>> df
  col1 col2  col100  ID
0    a    x       0   1
1    a    y       3   2
2    a    z       1   4

First, set the ID column to be the index:

>>> df = df.set_index('ID')
>>> df
   col1 col2  col100
ID                  
1     a    x       0
2     a    y       3
4     a    z       1

Now you can use df.loc to easily add rows.

Let's select the last row as the default row:

>>> default_row = df.iloc[-1]
>>> default_row
col1      a
col2      z
col100    1
Name: 4, dtype: object

We can add it right into the dataframe at ID 3:

>>> df.loc[3] = default_row
>>> df
   col1 col2  col100
ID                  
1     a    x       0
2     a    y       3
4     a    z       1
3     a    z       1

Then use sort_index to sort the rows lexicographically by index:

>>> df = df.sort_index()
>>> df
   col1 col2  col100
ID                  
1     a    x       0
2     a    y       3
3     a    z       1
4     a    z       1

And, optionally, reset the index:

>>> df = df.reset_index()
>>> df
   ID col1 col2  col100
0   1    a    x       0
1   2    a    y       3
2   3    a    z       1
3   4    a    z       1

answered Dec 18, 2021 at 23:52

user17242583

Sign up to request clarification or add additional context in comments.

2 Comments

SemicolonExpected Over a year ago

I dont think I can use ID as an index because I have duplicate values, in fact it is expected that there are multiple instances of each "ID". For example in my example I have two instances of 1

user17242583 Over a year ago

Yes, you can use the index in that case. It's totally valid for the index to have multiple values. I know it's counter-intuitive, but it's the same thing for columns.

Collectives™ on Stack Overflow

Adding new rows with default value based on dataframe values into dataframe

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related