2

Good people, still learning python.Now I have a very huge dataframe of around 1 million rows however I do what to delete rows which have missing values from certain column location. I mean after locating rows with missing values from certain column, I should be able to delete the entire row.Look at the sample dataframe:

import pandas as pd 

data = {'A':['NW', 'NB', 'UK', 'CAN'],'B':['Tom', 'nick', 'krish', 'jack'], 'C':[20, '', 19, ''],'D':[20, '', 19, ''],'E':[20, '', 19, ''],'F':[20, '', 19, '']} 
df = pd.DataFrame(data) 
print(df)

in this dataframe, I want to identify rows with missing values from column C to column F then delete entire row, which means I have to delete row 1 and row 3. I have tried this but it aint working :

df.dropna(subset=['C','D','E','F'], how='all', inplace = True)

EXPECTED OUTPUT should be a table like this

import pandas as pd 

data = {'A':['NW', 'UK'],'B':['Tom', 'krish'], 'C':[20, 19],'D':[20, 19 ],'E':[20,  19 ],'F':[20, 19]} 
df = pd.DataFrame(data) 
print(df)
0

3 Answers 3

4

pandas.DataFrame.mask

  • Use mask to make the dataframe null anywhere a condition is True
  • Then use dropna with the subset argument

This is instead of using replace as mentioned by zipa and jezrael. There isn't any advantage to using this over the other methods. I just like using mask. It also serves the community to show another method for accomplishing the task.


df.mask(df.eq('')).dropna(subset=['C', 'D', 'E', 'F'])

    A      B   C   D   E   F
0  NW    Tom  20  20  20  20
2  UK  krish  19  19  19  19
Sign up to request clarification or add additional context in comments.

Comments

3

This should do it:

df.replace(r'^\s*$', np.nan, regex=True).dropna()

1 Comment

Solution remove all rows with any '', so it is wrong, need df.replace(r'^\s*$', np.nan, regex=True).dropna(subset=['C','D','E','F'], how='all')
2

One solution is compare empty strings in list of columns and get at least one one '' per rows:

df = df[df[['C','D','E','F']].ne('').any(1)]

Details:

print(df[['C','D','E','F']].ne(''))
       C      D      E      F
0   True   True   True   True
1  False  False  False  False
2   True   True   True   True
3  False  False  False  False

print (df[['C','D','E','F']].ne('').any(1))
0     True
1    False
2     True
3    False
dtype: bool

Alternative with inverted condition by ~:

df = df[~df[['C','D','E','F']].eq('').all(1)]

Details:

print(df[['C','D','E','F']].eq(''))
       C      D      E      F
0  False  False  False  False
1   True   True   True   True
2  False  False  False  False
3   True   True   True   True

print (df[['C','D','E','F']].eq('').all(1))
0    False
1     True
2    False
3     True
dtype: bool

print (~df[['C','D','E','F']].eq('').all(1))
0     True
1    False
2     True
3    False
dtype: bool

print(df)
    A      B   C   D   E   F
0  NW    Tom  20  20  20  20
2  UK  krish  19  19  19  19

For working with your solution is necessary replace empty strings to missing values first:

df = df.replace('',np.nan)
df.dropna(subset=['C','D','E','F'], how='all', inplace = True)
print(df)
    A      B     C     D     E     F
0  NW    Tom  20.0  20.0  20.0  20.0
2  UK  krish  19.0  19.0  19.0  19.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.