7

In a python script using the library pandas, I have a dataset of let's say 100 lines with a feature "X", containing 36 NaN values, and a list of size 36.

I want to replace all the 36 missing values of the column "X" by the 36 values I have in my list.

It's likely to be a dumb question, but I went through all the doc and couldn't find a way to do it.

Here's an example :

INPUT

Data:   X      Y
        1      8
        2      3
        NaN    2
        NaN    7
        1      2
        NaN    2

Filler

List: [8, 6, 3]

OUTPUT

Data:   X      Y
        1      8
        2      3
        8      2
        6      7
        1      2
        3      2
3
  • 1
    Can you provide input and expected output Commented Feb 10, 2017 at 19:54
  • 1
    Sure, I edited my post to add it. Commented Feb 10, 2017 at 20:00
  • Are all the NaN values in the same column? How do you replace the NaN values with your list? Do you just do that sequentially, i.e. replace the first NaN value with the fist value in the list, and so on? Commented Feb 10, 2017 at 20:07

3 Answers 3

11

Start with your dataframe df

print(df)

     X  Y
0  1.0  8
1  2.0  3
2  NaN  2
3  NaN  7
4  1.0  2
5  NaN  2

Define the values you want to fill with (Note: there must be the same number of elements in your filler list as NaN values in your dataframe)

filler = [8, 6, 3]

Filter your column (that contains the NaN values) and overwrite the selected rows with your filler

df.X[df.X.isnull()] = filler

df.loc[df.X.isnull(), 'X'] = filler

which gives:

print(df)

     X  Y
0  1.0  8
1  2.0  3
2  8.0  2
3  6.0  7
4  1.0  2
5  3.0  2
Sign up to request clarification or add additional context in comments.

5 Comments

It works just fine, thanks, but I've got a warning "SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame". It's weird as I see that it indeed modifies df...
According to the docs listed in the warning, you need to change df.X[df.X.isnull()] to df.loc[df.X.isnull(), 'X']
@MadPhysicist is right if you want to avoid this warning. Personally I tend to use the original syntax since it looks more intuitive (to me) and I just ignore the warnings since it does exactly what I want anyway. But if the .loc method looks good to you then you should definitely use that one instead.
The warning is not a random occurrence. It happens because the indexed column may in fact be a copy, in which case the assignment will not actually modify the original data frame. You have been lucky thus far in that you have been getting views instead of copies, but that does not make it a good habit.
@MadPhysicist you're totally right. I shouldn't be demonstrating a method for solving a problem that is unstable, especially when helping others. I will edit the answer and I promise not to pass on my bad practices in the future.
1

You'd have to use an iterator as an index marker for replacing your NaN's with the value in your custom list:

import numpy as np
import pandas as pd

your_df = pd.DataFrame({'your_column': [0,1,2,np.nan,4,6,np.nan,np.nan,7,8,np.nan,9]})  # a df with 4 NaN's
print your_df

your_custom_list = [1,3,6,8]  # custom list with 4 fillers

your_column_vals = your_df['your_column'].values

i_custom = 0  # starting index on your iterator for your custom list
for i in range(len(your_column_vals)):
    if np.isnan(your_column_vals[i]):
        your_column_vals[i] = your_custom_list[i_custom]
        i_custom += 1  # increase the index

your_df['your_column'] = your_column_vals

print your_df

Output:

    your_column
0           0.0
1           1.0
2           2.0
3           NaN
4           4.0
5           6.0
6           NaN
7           NaN
8           7.0
9           8.0
10          NaN
11          9.0
    your_column
0           0.0
1           1.0
2           2.0
3           1.0
4           4.0
5           6.0
6           3.0
7           6.0
8           7.0
9           8.0
10          8.0
11          9.0

Comments

1

This may not be the efficient one, but still works :) First find all index for the Nan's and replace them in loop. Assuming that list is always bigger than number of Nan's

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [np.nan, 1, 2], 'B': [10, np.nan, np.nan], 'C': [[20, 21, 22], [23, 24, 25], np.nan]})
lst=[12,35,78]

index = df['B'].index[df['B'].apply(np.isnan)] #find Index
cnt=0
for item in index:
    df.set_value(item, 'B', lst[item]) #replace Nan of the nth index with value from Nth value from list
    cnt=cnt+1

print df

     A     B             C
0  NaN  10.0  [20, 21, 22]
1  1.0   NaN  [23, 24, 25]
2  2.0   NaN           NaN

Output .

     A     B             C
0  NaN  10.0  [20, 21, 22]
1  1.0  35.0  [23, 24, 25]
2  2.0  78.0           NaN

3 Comments

Here it would replace the 10 of the first row, I don't want to do it : I just want to change the NaN values.
It wont , only replaces Nan's
Ok if index corresponds to the missing lines, you're right sry

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.