I have dataframe of n columns and I would like to count the number of missing values in each column.
I use the following snippet of code to perform this but the output isn't what I'm expecting:
for (e <- df.columns) {
var c: Int = df.filter( df(e).isNull || df(e) === "" || df(e).isNaN ||
df(e) === "-" || df(e) === "NA").count()
println(e+":"+c)
}
Output:
column1:
column2:
column3:
How to get the count of missing values correctly based on the logic stated in the snippet?