1

I have a pandas dataframe in which one of the columns has a few missing values.

The data frame consists of hundreds of rows, but in column 4, five of the values are ?.

I want to remove the rows in which values are ? in this column.

I have tried using something like

df = df[np.isfinite(df[:,4])]
3
  • Are they actually ? (the string)? Do you want to remove the row if it contains any column as such? Commented Sep 24, 2016 at 13:32
  • Does the DataFrame.dropna() method achieve what you want to do? Commented Sep 24, 2016 at 13:36
  • df[df.iloc[:,4].astype(str) != "?"]. That is, if column 4 means index 4. Otherwise, you may want to use index 3 for column 4. Commented Sep 24, 2016 at 13:44

1 Answer 1

1

To remove the rows for which the 4th column are equal to ?, you can select the data that are not equal to ?.

# Test data
df = DataFrame({
        'col0': [0, 1, 2, 3, 4],
        'col1': [0, 1, 2, 3, 4],
        'col2': [0, 1, 2, 3, 4],
        'col3': [0, 1, 2, 3, 4],
        'col4': [0, 1, 2, '?', '?']})

df.loc[df.iloc[:, 4] != '?']

   col0  col1  col2  col3 col4
0     0     0     0     0    0
1     1     1     1     1    1
2     2     2     2     2    2

If you want to eliminate the rows for which the 4th column contains ?, it's a bit trickier since you have to escape the ? character and provide a default value False for the boolean indexing to work and finally the boolean negation ~.

df.loc[~df.iloc[:,4].str.contains('\?', na = False)]

   col0  col1  col2  col3 col4
0     0     0     0     0    0
1     1     1     1     1    1
2     2     2     2     2    2

Edit

If the column contains only numbers, you can also use the following method. Converting to numeric using the errors parameter coerce in order to produce NaN for values that cannot be converting. Then simply dropping the values using dropna.

df.iloc[] = pd.to_numeric(df.iloc[:,4], errors='coerce')
# Or if you want to apply the transformation to the entire DataFrame
# df = df.apply(pd.to_numeric, errors='coerce')    
df.dropna(inplace=True)

      col0  col1  col2  col3  col4
0     0     0     0     0   0.0
1     1     1     1     1   1.0
2     2     2     2     2   2.0
Sign up to request clarification or add additional context in comments.

2 Comments

Won't column 4 have all its numbers as string values after this because it had string values when it was loaded?
@Jamgreen Yes, I have just added an Edit to use this approach.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.