2

I have dataframe where column 1 should have all the values from 1 to 169. If a value doesnt exists, I'd like to add a new row to my dataframe which contains the said value (and some zeros).

I can't get the following code to work, even tho there are no errors:

for i in range(1,170):
    if i in df.col1 is False:
        df.loc[len(df)+1] = [i,0,0]
    else:
        continue

Any advices?

1
  • This line if i in df.col1 is False: is never True hence it never adds a new row, you should change it to if any(df.col1.isin([i])) == False this tests if the value is not in the column which will return a boolean series, and tests if any of the rows are false. Do you require the missing rows to be appended at the end of the df? Commented Feb 15, 2015 at 22:08

2 Answers 2

3

It would be better to do something like:

In [37]:
# create our test df, we have vales 1 to 9 in steps of 2
df = pd.DataFrame({'a':np.arange(1,10,2)})
df['b'] = np.NaN
df['c'] = np.NaN
df
Out[37]:
   a   b   c
0  1 NaN NaN
1  3 NaN NaN
2  5 NaN NaN
3  7 NaN NaN
4  9 NaN NaN
In [38]:
# now set the index to a, this allows us to reindex the values with optional fill value, then reset the index
df = df.set_index('a').reindex(index = np.arange(1,10), fill_value=0).reset_index()
df
Out[38]:
   a   b   c
0  1 NaN NaN
1  2   0   0
2  3 NaN NaN
3  4   0   0
4  5 NaN NaN
5  6   0   0
6  7 NaN NaN
7  8   0   0
8  9 NaN NaN

So just to explain the above:

In [40]:
# set the index to 'a', this allows us to reindex and fill missing values
df = df.set_index('a')
df
Out[40]:
    b   c
a        
1 NaN NaN
3 NaN NaN
5 NaN NaN
7 NaN NaN
9 NaN NaN
In [41]:
# now reindex and pass fill_value for the extra rows we want
df = df.reindex(index = np.arange(1,10), fill_value=0)
df
Out[41]:
    b   c
a        
1 NaN NaN
2   0   0
3 NaN NaN
4   0   0
5 NaN NaN
6   0   0
7 NaN NaN
8   0   0
9 NaN NaN
In [42]:
# now reset the index
df = df.reset_index()
df
Out[42]:
   a   b   c
0  1 NaN NaN
1  2   0   0
2  3 NaN NaN
3  4   0   0
4  5 NaN NaN
5  6   0   0
6  7 NaN NaN
7  8   0   0
8  9 NaN NaN

If you modified your loop to the following then it would work:

In [63]:

for i in range(1,10):
    if any(df.a.isin([i])) == False:
        df.loc[len(df)+1] = [i,0,0]
    else:
        continue
df
Out[63]:
   a   b   c
0  1 NaN NaN
1  3 NaN NaN
2  5 NaN NaN
3  7 NaN NaN
4  9 NaN NaN
6  2   0   0
7  4   0   0
8  6   0   0
9  8   0   0

EDIT

If you wanted the missing rows to appear at the end of the df then you could just create a temporary df with the full range of values and other columns set to zero and then filter this df based on the values that are missing in the other df and concatenate them:

In [70]:

df_missing = pd.DataFrame({'a':np.arange(10),'b':0,'c':0})
df_missing
Out[70]:
   a  b  c
0  0  0  0
1  1  0  0
2  2  0  0
3  3  0  0
4  4  0  0
5  5  0  0
6  6  0  0
7  7  0  0
8  8  0  0
9  9  0  0
In [73]:

df = pd.concat([df,df_missing[~df_missing.a.isin(df.a)]], ignore_index=True)
df
Out[73]:
   a   b   c
0  1 NaN NaN
1  3 NaN NaN
2  5 NaN NaN
3  7 NaN NaN
4  9 NaN NaN
5  0   0   0
6  2   0   0
7  4   0   0
8  6   0   0
9  8   0   0
Sign up to request clarification or add additional context in comments.

Comments

0

The expression if i in df.col1 is False always evaluates to false. I think it is looking in the index. Also I think you need to use pandas.concat in modern versions of pandas instead of assigning to df.loc[].

I would recommend gathering all missing values in a list then concatenating them to the dataframe at the end. For instance

>>> df = pd.DataFrame({'col1': range(5) + [i + 6 for i in range(5)], 'col2': range(10)})
>>> print df
   col1  col2
0     0     0
1     1     1
2     2     2
3     3     3
4     4     4
5     6     5
6     7     6
7     8     7
8     9     8
9    10     9
>>> to_add = []
>>> for i in range(11):
...     if i not in df.col1.values:
...         to_add.append([i, 0])
...     else:
...         continue
...        
>>> pd.concat([df, pd.DataFrame(to_add, columns=['col1', 'col2'])])
   col1  col2
0     0     0
1     1     1
2     2     2
3     3     3
4     4     4
5     6     5
6     7     6
7     8     7
8     9     8
9    10     9
0     5     0

I assume you don't care about the index values of the rows you add.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.