Remove rows with missing values in pandas

Question

I have a pandas dataframe in which one of the columns has a few missing values.

The data frame consists of hundreds of rows, but in column 4, five of the values are ?.

I want to remove the rows in which values are ? in this column.

I have tried using something like

df = df[np.isfinite(df[:,4])]

Are they actually ? (the string)? Do you want to remove the row if it contains any column as such? — Jon Clements
– Jon Clements, Commented Sep 24, 2016 at 13:32
Does the DataFrame.dropna() method achieve what you want to do? — Angus Williams
– Angus Williams, Commented Sep 24, 2016 at 13:36
df[df.iloc[:,4].astype(str) != "?"]. That is, if column 4 means index 4. Otherwise, you may want to use index 3 for column 4. — Abdou
– Abdou, Commented Sep 24, 2016 at 13:44

Romain · Accepted Answer · 2016-09-25 13:50:26Z

1

To remove the rows for which the 4th column are equal to ?, you can select the data that are not equal to ?.

# Test data
df = DataFrame({
        'col0': [0, 1, 2, 3, 4],
        'col1': [0, 1, 2, 3, 4],
        'col2': [0, 1, 2, 3, 4],
        'col3': [0, 1, 2, 3, 4],
        'col4': [0, 1, 2, '?', '?']})

df.loc[df.iloc[:, 4] != '?']

   col0  col1  col2  col3 col4
0     0     0     0     0    0
1     1     1     1     1    1
2     2     2     2     2    2

If you want to eliminate the rows for which the 4th column contains ?, it's a bit trickier since you have to escape the ? character and provide a default value False for the boolean indexing to work and finally the boolean negation ~.

df.loc[~df.iloc[:,4].str.contains('\?', na = False)]

   col0  col1  col2  col3 col4
0     0     0     0     0    0
1     1     1     1     1    1
2     2     2     2     2    2

Edit

If the column contains only numbers, you can also use the following method. Converting to numeric using the errors parameter coerce in order to produce NaN for values that cannot be converting. Then simply dropping the values using dropna.

df.iloc[] = pd.to_numeric(df.iloc[:,4], errors='coerce')
# Or if you want to apply the transformation to the entire DataFrame
# df = df.apply(pd.to_numeric, errors='coerce')    
df.dropna(inplace=True)

      col0  col1  col2  col3  col4
0     0     0     0     0   0.0
1     1     1     1     1   1.0
2     2     2     2     2   2.0

edited Sep 25, 2016 at 13:50

answered Sep 24, 2016 at 21:16

Romain

22.2k6 gold badges63 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jamgreen Over a year ago

Won't column 4 have all its numbers as string values after this because it had string values when it was loaded?

Romain Over a year ago

@Jamgreen Yes, I have just added an Edit to use this approach.

Collectives™ on Stack Overflow

Remove rows with missing values in pandas

1 Answer 1

Edit

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Edit

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related