1

I have a pandas DataFrame df1 with the following content:

Serial N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            
   B              11            15
   C              12            11
   C                            9
   C              12            13
   C              12             

I would like to make a DataFrame that is based on df1 but that has any row containing an empty value removed. For example:

Serial N         year         current
   B              10            14
   B              10            16
   B              11            10
   B              11            15
   C              12            11
   C              12            13  

I tried something like this

df1=df[~np.isnan(df["year"]) or ~np.isnan(df["current"])]

But I received the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What could be the problem?

3 Answers 3

2

You can just call dropna to achieve this:

df1 = df.dropna()

As to why what you tried failed or operator doesn't understand what it should do when comparing array like structures as it is ambiguous if 1 or more elements meet the boolean criteria, you should use the bitwise operators &, | and ~ for and, or and not repsectively. Additionally for multiple conditions you need to wrap the conditions in parentheses due to operator precedence.

In [4]:
df.dropna()

Out[4]:
  Serial N  year  current
0        B    10       14
1        B    10       16
2        B    11       10
4        B    11       15
5        C    12       11
7        C    12       13
Sign up to request clarification or add additional context in comments.

1 Comment

Or df1 = df.dropna()
2

if you really have empty cells instead of NaN's:

In [122]: df
Out[122]:
  Serial_N  year current
0        B  10.0    14.0
1        B  10.0    16.0
2        B  11.0    10.0
3        B  11.0
4        B  11.0    15.0
5        C  12.0    11.0
6        C           9.0
7        C  12.0    13.0
8        C  12.0

In [123]: a.replace('', np.nan).dropna()
Out[123]:
  Serial_N  year current
0        B  10.0    14.0
1        B  10.0    16.0
2        B  11.0    10.0
4        B  11.0    15.0
5        C  12.0    11.0
7        C  12.0    13.0

Comments

2

Please try with bitwise operator | instead, like this:

df1=df[ (~np.isnan(df["year"])) | (~np.isnan(df["current"]))]

Using dropna(), as suggested by EdChum, is likely the cleanest and neatest solution here. You can read more about this or working with missing data generally here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.