Dealining with missing values in multiple columns of a dataframe in Python

Question

I am dealing with a huge dataframe with hundreds of columns with possibility of missing values in each of the columns. Here is sample:

import pandas as pd
import numpy as np

data = {'a':  [1,1,0,1,1],
        'b': ["a", "b", np.nan, 'c', np.nan],
        'c': ['b1','b2',np.nan, 'c1', np.nan],
        'd': [1,1,1, 2, np.nan],
        'e': [4,4,4, 3, np.nan]
       }
df = pd.DataFrame(data)
print(df)

   a    b    c    d    e
0  1    a   b1  1.0  4.0
1  1    b   b2  1.0  4.0
2  0  NaN  NaN  1.0  4.0
3  1    c   c1  2.0  3.0
4  1  NaN  NaN  NaN  NaN

In order to deal with missing values at once, I am doing something like this. Which basically if the missing values are in one the a,b, or c columns, then I replace them with a specific value.

df=df.fillna({'a':0, 'b':'other', 'c':-1})
print (df)
   a      b   c    d    e
0  1      a  b1  1.0  4.0
1  1      b  b2  1.0  4.0
2  0  other  -1  1.0  4.0
3  1      c  c1  2.0  3.0
4  1  other  -1  NaN  NaN

What I would like to do is if the missing values in any other columns than those three columns, then simply replace the missing values with a value that appears the most often in that column. For example, in column d, 1 is repeated the most so I simply replace missing value in with 1.0.

what happens if you have a tie? eg. with [1, 2, 1, 2, NaN]? — mozway
– mozway, Commented Jan 5, 2023 at 18:00

mozway · Accepted Answer · 2023-01-05 18:02:42Z

1

Assuming you have a single mode or are fine with getting the first value:

d = {'a':0, 'b':'other', 'c':-1}
d2 = df.drop(columns=list(d)).mode().loc[0].to_dict()

out = df.fillna(d|d2) # requires python 3.9+

# for 3.5 <= python < 3.9
# out = df.fillna({**d, **d2})

Output:

   a      b   c    d    e
0  1      a  b1  1.0  4.0
1  1      b  b2  1.0  4.0
2  0  other  -1  1.0  4.0
3  1      c  c1  2.0  3.0
4  1  other  -1  1.0  4.0

answered Jan 5, 2023 at 18:02

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Dealining with missing values in multiple columns of a dataframe in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related