0

Lets consider the csv file with following data

Id,Job,year

1,,2000

CSV Reader code:

var inputDFRdd = spark.emptyDataFrame.rdd
inputDFRdd = spark.read.format("com.databricks.spark.csv")
        .option("mode", "FAILFAST")
        .option("delimiter", ",")
        .option("header", "false")
        .option("inferSchema", "false")
        .option("escape", "\"").load().rdd.zipWithIndex()
        .map(line => Row.fromSeq(Seq(line._2 + 1) ++ line._1.toSeq))

Using the above code to read a file from incoming file, the data frame reads the empty string as empty string, but when the same is used to read data from part file, data frame reads empty string as null.

Looking for a way to read empty string as empty string from the part file.

1 Answer 1

2

By default empty string will be inferred as null while reading CSV file.

You can change that behavior by using property - nullValue.

.option("nullValue", "null") // Only the string with value 'null' will be inferred as null.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.