Delete and replace Nan values with mean of the rows in pandas dataframe

Question

I have a very big DataFrame that looks like:

    c1   c2    c3
0  NaN  1.0   NaN
1  NaN  NaN   NaN
2  3.0  6.0   9.0
3  NaN  7.0  10.0
...

I want to:

1- Delete the rows with all "Nan" values. like the second row in the sample.

2- Replace all the "Nan" values in other rows with the mean of the rows.

Note: in the rows, we have different "Nan" values. could you please help me with that? Thanks.

Also, this link does not solve my question: Pandas Dataframe: Replacing NaN with row average

Here is a sample of my DataFrame:

import pandas as pd
import numpy as np


df = pd.DataFrame()
df['c1'] = [np.nan, np.nan, 3, np.nan]
df['c2'] = [1, np.nan, 6, 7]
df['c3'] = [np.nan, np.nan, 9, 10]

Update: When we don't want to consider the mean of all rows. sample dataframe:

import pandas as pd
import numpy as np


df = pd.DataFrame()
df['id'] = [1, 2, 3, 4, 5]
df['c1'] = [np.nan, np.nan, 3, np.nan, 5]
df['c2'] = [1, np.nan, 3, 11, 5]
df['c3'] = [1, np.nan, 3, 11, np.nan]
df['c4'] = [3, np.nan, 3, 11, 5]

output: 
df = pd.DataFrame()
df['id'] = [1,  3, 4, 5]
df['c1'] = [ 5/3, 3, 11, 5]
df['c2'] = [1,  3, 11, 5]
df['c3'] = [1,  3, 11, 5]
df['c4'] = [3,  3, 11, 5]
df

For this part, I don't want to consider the value of id for calculating the mean of row.

eshirvana · Accepted Answer · 2022-02-08 21:44:31Z

1

how about this :

df = df.T.fillna(df.mean(axis=1)).T.dropna()
print(df)

output:

>>>
    c1   c2    c3
0  1.0  1.0   1.0
2  3.0  6.0   9.0
3  8.5  7.0  10.0

edited Feb 8, 2022 at 21:44

answered Feb 8, 2022 at 20:46

eshirvana

24.7k3 gold badges28 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user18147555 Over a year ago

Thanks @eshirvana. However, this provide the mean of coulmns, not the rows.

eshirvana Over a year ago

@Yellowman you already got your answer , but see my updated answer as well

user18147555 Over a year ago

Thank you so much. Yes. It works now.

user18147555 Over a year ago

Can I ask a quick question? I want to replace the df.mean which return the mean of row. I mean, I want to replace the Nan with mean of column 1 until the end. Do you know how to solve that? I asked this question in the first version very bad.

eshirvana Over a year ago

Not sure I understand you correctly , provide sample data and desired output

|

score 0 · Accepted Answer · 2022-02-08 21:00:13Z

0

You could create a dictionary from the column names and row means and pass it to fillna to fill the NaN values. Then drop the NaN rows (which won't get filled in because all NaN rows have mean NaN).

out = df.fillna(dict.fromkeys(df.columns, df.mean(axis=1))).dropna()

Another possibility is to transpose the DataFrame and use fillna to fill, then transpose back:

df_T = df.T
df_T.fillna(df_T.mean()).T.dropna()

Output:

    c1   c2    c3
0  1.0  1.0   1.0
2  3.0  6.0   9.0
3  8.5  7.0  10.0

edited Feb 8, 2022 at 21:00

answered Feb 8, 2022 at 20:57

user7864386

3 Comments

user18147555 Over a year ago

Thank you @enke. It works for me. A quick question, since my data is very very large, is this method fast enough? Sorry, I am new to python and data frame and probably this is very basic question.

user7864386 Over a year ago

@Yellowman should be fast enough, I think

user18147555 Over a year ago

Thank you so much.

Collectives™ on Stack Overflow

Delete and replace Nan values with mean of the rows in pandas dataframe

2 Answers 2

6 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related