spark.read. reading empty string as null when data is read from part file

Question

Lets consider the csv file with following data

Id,Job,year

1,,2000

CSV Reader code:

var inputDFRdd = spark.emptyDataFrame.rdd
inputDFRdd = spark.read.format("com.databricks.spark.csv")
        .option("mode", "FAILFAST")
        .option("delimiter", ",")
        .option("header", "false")
        .option("inferSchema", "false")
        .option("escape", "\"").load().rdd.zipWithIndex()
        .map(line => Row.fromSeq(Seq(line._2 + 1) ++ line._1.toSeq))

Using the above code to read a file from incoming file, the data frame reads the empty string as empty string, but when the same is used to read data from part file, data frame reads empty string as null.

Looking for a way to read empty string as empty string from the part file.

Mohana B C · Accepted Answer · 2021-09-16 19:28:05Z

2

By default empty string will be inferred as null while reading CSV file.

You can change that behavior by using property - nullValue.

.option("nullValue", "null") // Only the string with value 'null' will be inferred as null.

answered Sep 16, 2021 at 19:28

Mohana B C

5,4721 gold badge13 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

spark.read. reading empty string as null when data is read from part file

Id,Job,year

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Id,Job,year

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related