Check for empty row within spark dataframe?

Question

Running over several csv files and i am trying to run and do some checks and for some reason for one file i am getting a NullPointerException and i am suspecting that there are some empty row.

So i am running the following and for some reason it gives me an OK output:

check_empty = lambda row : not any([False if k is None else True for k in row])
check_empty_udf = sf.udf(check_empty, BooleanType())
df.filter(check_empty_udf(sf.struct([col for col in df.columns]))).show()

I am missing something within the filter function or we can't extract empty rows from dataframes.

shriyog · Accepted Answer · 2018-11-19 15:22:30Z

3

You could use df.dropna() to drop empty rows and then compare the counts.

Something like

df_clean = df.dropna()
num_empty_rows = df.count() - df_clean.count()

edited Nov 19, 2018 at 15:22

shriyog

9681 gold badge13 silver badges26 bronze badges

answered Nov 19, 2018 at 14:26

Andrew F

2,9701 gold badge17 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ziedTn Over a year ago

Thanks Andrew, but i would like to check the content of those rows so i have more clear idea of what's happening.

ziedTn Over a year ago

the wired things that i got zero, but the same piece of code has works fine with dataframe resulted from dropna transfromation instead it throw the exception without the one dropna

shriyog · Accepted Answer · 2018-11-19 14:59:12Z

0

You could use an inbuilt option for dealing with such scenarios.

val df = spark.read
     .format("csv")
     .option("header", "true")
     .option("mode", "DROPMALFORMED") // Drop empty/malformed rows
     .load("hdfs:///path/file.csv")

Check this reference - https://docs.databricks.com/spark/latest/data-sources/read-csv.html#reading-files

answered Nov 19, 2018 at 14:59

shriyog

9681 gold badge13 silver badges26 bronze badges

Collectives™ on Stack Overflow

Check for empty row within spark dataframe?

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related