1

I have a very big DataFrame that looks like:

    c1   c2    c3
0  NaN  1.0   NaN
1  NaN  NaN   NaN
2  3.0  6.0   9.0
3  NaN  7.0  10.0
...

I want to:

1- Delete the rows with all "Nan" values. like the second row in the sample.

2- Replace all the "Nan" values in other rows with the mean of the rows.

Note: in the rows, we have different "Nan" values. could you please help me with that? Thanks.

Also, this link does not solve my question: Pandas Dataframe: Replacing NaN with row average

Here is a sample of my DataFrame:

import pandas as pd
import numpy as np


df = pd.DataFrame()
df['c1'] = [np.nan, np.nan, 3, np.nan]
df['c2'] = [1, np.nan, 6, 7]
df['c3'] = [np.nan, np.nan, 9, 10]

Update: When we don't want to consider the mean of all rows. sample dataframe:

import pandas as pd
import numpy as np


df = pd.DataFrame()
df['id'] = [1, 2, 3, 4, 5]
df['c1'] = [np.nan, np.nan, 3, np.nan, 5]
df['c2'] = [1, np.nan, 3, 11, 5]
df['c3'] = [1, np.nan, 3, 11, np.nan]
df['c4'] = [3, np.nan, 3, 11, 5]

output: 
df = pd.DataFrame()
df['id'] = [1,  3, 4, 5]
df['c1'] = [ 5/3, 3, 11, 5]
df['c2'] = [1,  3, 11, 5]
df['c3'] = [1,  3, 11, 5]
df['c4'] = [3,  3, 11, 5]
df

For this part, I don't want to consider the value of id for calculating the mean of row.

2 Answers 2

1

how about this :

df = df.T.fillna(df.mean(axis=1)).T.dropna()
print(df)

output:

>>>
    c1   c2    c3
0  1.0  1.0   1.0
2  3.0  6.0   9.0
3  8.5  7.0  10.0
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks @eshirvana. However, this provide the mean of coulmns, not the rows.
@Yellowman you already got your answer , but see my updated answer as well
Thank you so much. Yes. It works now.
Can I ask a quick question? I want to replace the df.mean which return the mean of row. I mean, I want to replace the Nan with mean of column 1 until the end. Do you know how to solve that? I asked this question in the first version very bad.
Not sure I understand you correctly , provide sample data and desired output
|
0

You could create a dictionary from the column names and row means and pass it to fillna to fill the NaN values. Then drop the NaN rows (which won't get filled in because all NaN rows have mean NaN).

out = df.fillna(dict.fromkeys(df.columns, df.mean(axis=1))).dropna()

Another possibility is to transpose the DataFrame and use fillna to fill, then transpose back:

df_T = df.T
df_T.fillna(df_T.mean()).T.dropna()

Output:

    c1   c2    c3
0  1.0  1.0   1.0
2  3.0  6.0   9.0
3  8.5  7.0  10.0

3 Comments

Thank you @enke. It works for me. A quick question, since my data is very very large, is this method fast enough? Sorry, I am new to python and data frame and probably this is very basic question.
@Yellowman should be fast enough, I think
Thank you so much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.