Replace NaN values of pandas.DataFrame with values from list

Question

In a python script using the library pandas, I have a dataset of let's say 100 lines with a feature "X", containing 36 NaN values, and a list of size 36.

I want to replace all the 36 missing values of the column "X" by the 36 values I have in my list.

It's likely to be a dumb question, but I went through all the doc and couldn't find a way to do it.

Here's an example :

INPUT

Data:   X      Y
        1      8
        2      3
        NaN    2
        NaN    7
        1      2
        NaN    2

Filler

List: [8, 6, 3]

OUTPUT

Data:   X      Y
        1      8
        2      3
        8      2
        6      7
        1      2
        3      2

Are all the NaN values in the same column? How do you replace the NaN values with your list? Do you just do that sequentially, i.e. replace the first NaN value with the fist value in the list, and so on? — Longwen Ou
– Longwen Ou, Commented Feb 10, 2017 at 20:07

bunji · Accepted Answer · 2017-02-10 20:58:27Z

11

Start with your dataframe df

print(df)

     X  Y
0  1.0  8
1  2.0  3
2  NaN  2
3  NaN  7
4  1.0  2
5  NaN  2

Define the values you want to fill with (Note: there must be the same number of elements in your filler list as NaN values in your dataframe)

filler = [8, 6, 3]

Filter your column (that contains the NaN values) and overwrite the selected rows with your filler

~~df.X[df.X.isnull()] = filler~~

df.loc[df.X.isnull(), 'X'] = filler

which gives:

print(df)

     X  Y
0  1.0  8
1  2.0  3
2  8.0  2
3  6.0  7
4  1.0  2
5  3.0  2

edited Feb 10, 2017 at 20:58

answered Feb 10, 2017 at 20:17

bunji

5,2331 gold badge19 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

MeanStreet Over a year ago

It works just fine, thanks, but I've got a warning "SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame". It's weird as I see that it indeed modifies df...

Mad Physicist Over a year ago

According to the docs listed in the warning, you need to change df.X[df.X.isnull()] to df.loc[df.X.isnull(), 'X']

bunji Over a year ago

@MadPhysicist is right if you want to avoid this warning. Personally I tend to use the original syntax since it looks more intuitive (to me) and I just ignore the warnings since it does exactly what I want anyway. But if the .loc method looks good to you then you should definitely use that one instead.

Mad Physicist Over a year ago

The warning is not a random occurrence. It happens because the indexed column may in fact be a copy, in which case the assignment will not actually modify the original data frame. You have been lucky thus far in that you have been getting views instead of copies, but that does not make it a good habit.

bunji Over a year ago

@MadPhysicist you're totally right. I shouldn't be demonstrating a method for solving a problem that is unstable, especially when helping others. I will edit the answer and I promise not to pass on my bad practices in the future.

Scratch'N'Purr · Accepted Answer · 2017-02-10 20:12:02Z

You'd have to use an iterator as an index marker for replacing your NaN's with the value in your custom list:

import numpy as np
import pandas as pd

your_df = pd.DataFrame({'your_column': [0,1,2,np.nan,4,6,np.nan,np.nan,7,8,np.nan,9]})  # a df with 4 NaN's
print your_df

your_custom_list = [1,3,6,8]  # custom list with 4 fillers

your_column_vals = your_df['your_column'].values

i_custom = 0  # starting index on your iterator for your custom list
for i in range(len(your_column_vals)):
    if np.isnan(your_column_vals[i]):
        your_column_vals[i] = your_custom_list[i_custom]
        i_custom += 1  # increase the index

your_df['your_column'] = your_column_vals

print your_df

Output:

    your_column
0           0.0
1           1.0
2           2.0
3           NaN
4           4.0
5           6.0
6           NaN
7           NaN
8           7.0
9           8.0
10          NaN
11          9.0
    your_column
0           0.0
1           1.0
2           2.0
3           1.0
4           4.0
5           6.0
6           3.0
7           6.0
8           7.0
9           8.0
10          8.0
11          9.0

Shijo · Accepted Answer · 2017-02-10 20:14:41Z

1

This may not be the efficient one, but still works :) First find all index for the Nan's and replace them in loop. Assuming that list is always bigger than number of Nan's

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [np.nan, 1, 2], 'B': [10, np.nan, np.nan], 'C': [[20, 21, 22], [23, 24, 25], np.nan]})
lst=[12,35,78]

index = df['B'].index[df['B'].apply(np.isnan)] #find Index
cnt=0
for item in index:
    df.set_value(item, 'B', lst[item]) #replace Nan of the nth index with value from Nth value from list
    cnt=cnt+1

print df

     A     B             C
0  NaN  10.0  [20, 21, 22]
1  1.0   NaN  [23, 24, 25]
2  2.0   NaN           NaN

Output .

     A     B             C
0  NaN  10.0  [20, 21, 22]
1  1.0  35.0  [23, 24, 25]
2  2.0  78.0           NaN

edited Feb 10, 2017 at 20:14

answered Feb 10, 2017 at 20:08

Shijo

9,7913 gold badges23 silver badges31 bronze badges

3 Comments

MeanStreet Over a year ago

Here it would replace the 10 of the first row, I don't want to do it : I just want to change the NaN values.

Shijo Over a year ago

It wont , only replaces Nan's

MeanStreet Over a year ago

Ok if index corresponds to the missing lines, you're right sry

Collectives™ on Stack Overflow

Replace NaN values of pandas.DataFrame with values from list

3 Answers 3

5 Comments

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related