I have a csv file with "" (empty value) and "N/A" and "-" all in the same files. I want them all to be read into the dataframe as nulls. I know that there is an option in spark-csv "nullValue" , which allows me to treat one single string as null. But for me, that is not sufficient for obvious reasons.
There is a pending issue from spark, https://github.com/databricks/spark-csv/issues/333
which is still open. I was wondering about the most elegent way to get around the problem.
replaceAlland make your data uniform.