5

I am trying to get the rows with null values from a pyspark dataframe. In pandas, I can achieve this using isnull() on the dataframe:

df = df[df.isnull().any(axis=1)]

But in case of PySpark, when I am running below command it shows Attributeerror:

df.filter(df.isNull())

AttributeError: 'DataFrame' object has no attribute 'isNull'.

How can get the rows with null values without checking it for each column?

2
  • see stackoverflow.com/questions/37262762/… Commented Nov 26, 2018 at 18:33
  • @Mikhail I think that the question linked is not the answer because this question ask about all columns at the same time, without checking for each column. Commented Dec 11, 2018 at 9:26

2 Answers 2

10

You can filter the rows with where, reduce and a list comprehension. For example, given the following dataframe:

df = sc.parallelize([
    (0.4, 0.3),
    (None, 0.11),
    (9.7, None), 
    (None, None)
]).toDF(["A", "B"])

df.show()
+----+----+
|   A|   B|
+----+----+
| 0.4| 0.3|
|null|0.11|
| 9.7|null|
|null|null|
+----+----+

Filtering the rows with some null value could be achieved with:

import pyspark.sql.functions as f
from functools import reduce

df.where(reduce(lambda x, y: x | y, (f.col(x).isNull() for x in df.columns))).show()

Which gives:

+----+----+
|   A|   B|
+----+----+
|null|0.11|
| 9.7|null|
|null|null|
+----+----+

In the condition statement you have to specify if any (or, |), all (and, &), etc.

Sign up to request clarification or add additional context in comments.

1 Comment

Good answer, I was wondering how I could construct the boolean expression programmatically
0

This is how you can do this in scala

import org.apache.spark.sql.functions._

case class Test(id:Int, weight:Option[Int], age:Int, gender: Option[String])

val df1 = Seq(Test(1, Some(100), 23, Some("Male")), Test(2, None, 25, None), Test(3, None, 33, Some("Female"))).toDF()
    
display(df1.filter(df1.columns.map(c => col(c).isNull).reduce((a,b) => a || b)))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.