Delete rows with missing values from certain location in data frame python

Question

Good people, still learning python.Now I have a very huge dataframe of around 1 million rows however I do what to delete rows which have missing values from certain column location. I mean after locating rows with missing values from certain column, I should be able to delete the entire row.Look at the sample dataframe:

import pandas as pd 

data = {'A':['NW', 'NB', 'UK', 'CAN'],'B':['Tom', 'nick', 'krish', 'jack'], 'C':[20, '', 19, ''],'D':[20, '', 19, ''],'E':[20, '', 19, ''],'F':[20, '', 19, '']} 
df = pd.DataFrame(data) 
print(df)

in this dataframe, I want to identify rows with missing values from column C to column F then delete entire row, which means I have to delete row 1 and row 3. I have tried this but it aint working :

df.dropna(subset=['C','D','E','F'], how='all', inplace = True)

EXPECTED OUTPUT should be a table like this

import pandas as pd 

data = {'A':['NW', 'UK'],'B':['Tom', 'krish'], 'C':[20, 19],'D':[20, 19 ],'E':[20,  19 ],'F':[20, 19]} 
df = pd.DataFrame(data) 
print(df)

piRSquared · Accepted Answer · 2019-10-14 14:19:45Z

4

`pandas.DataFrame.mask`

Use mask to make the dataframe null anywhere a condition is True
Then use dropna with the subset argument

This is instead of using replace as mentioned by zipa and jezrael. There isn't any advantage to using this over the other methods. I just like using mask. It also serves the community to show another method for accomplishing the task.

df.mask(df.eq('')).dropna(subset=['C', 'D', 'E', 'F'])

    A      B   C   D   E   F
0  NW    Tom  20  20  20  20
2  UK  krish  19  19  19  19

answered Oct 14, 2019 at 14:19

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

zipa · Accepted Answer · 2019-10-14 13:57:31Z

3

This should do it:

df.replace(r'^\s*$', np.nan, regex=True).dropna()

answered Oct 14, 2019 at 13:57

zipa

28k6 gold badges45 silver badges62 bronze badges

1 Comment

jezrael Over a year ago

Solution remove all rows with any '', so it is wrong, need df.replace(r'^\s*$', np.nan, regex=True).dropna(subset=['C','D','E','F'], how='all')

jezrael · Accepted Answer · 2019-10-14 13:56:57Z

One solution is compare empty strings in list of columns and get at least one one '' per rows:

df = df[df[['C','D','E','F']].ne('').any(1)]

Details:

print(df[['C','D','E','F']].ne(''))
       C      D      E      F
0   True   True   True   True
1  False  False  False  False
2   True   True   True   True
3  False  False  False  False

print (df[['C','D','E','F']].ne('').any(1))
0     True
1    False
2     True
3    False
dtype: bool

Alternative with inverted condition by ~:

df = df[~df[['C','D','E','F']].eq('').all(1)]

Details:

print(df[['C','D','E','F']].eq(''))
       C      D      E      F
0  False  False  False  False
1   True   True   True   True
2  False  False  False  False
3   True   True   True   True

print (df[['C','D','E','F']].eq('').all(1))
0    False
1     True
2    False
3     True
dtype: bool

print (~df[['C','D','E','F']].eq('').all(1))
0     True
1    False
2     True
3    False
dtype: bool

print(df)
    A      B   C   D   E   F
0  NW    Tom  20  20  20  20
2  UK  krish  19  19  19  19

For working with your solution is necessary replace empty strings to missing values first:

df = df.replace('',np.nan)
df.dropna(subset=['C','D','E','F'], how='all', inplace = True)
print(df)
    A      B     C     D     E     F
0  NW    Tom  20.0  20.0  20.0  20.0
2  UK  krish  19.0  19.0  19.0  19.0

Collectives™ on Stack Overflow

Delete rows with missing values from certain location in data frame python

3 Answers 3

`pandas.DataFrame.mask`

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related