Lets consider the csv file with following data
Id,Job,year
1,,2000
CSV Reader code:
var inputDFRdd = spark.emptyDataFrame.rdd
inputDFRdd = spark.read.format("com.databricks.spark.csv")
.option("mode", "FAILFAST")
.option("delimiter", ",")
.option("header", "false")
.option("inferSchema", "false")
.option("escape", "\"").load().rdd.zipWithIndex()
.map(line => Row.fromSeq(Seq(line._2 + 1) ++ line._1.toSeq))
Using the above code to read a file from incoming file, the data frame reads the empty string as empty string, but when the same is used to read data from part file, data frame reads empty string as null.
Looking for a way to read empty string as empty string from the part file.