7

I'm using the below code to write to a CSV file.

df.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").option("nullValue"," ").save("/home/user/test_table/")

when I execute it, I'm getting the following error:

java.lang.UnsupportedOperationException: CSV data source does not support null data type.

Could anyone please help?

3
  • 1
    Could you please update question attaching df.printSchema() result? Commented Feb 7, 2017 at 13:20
  • 1
    how are your null values stored? When I used Python's None type as null object and did a save, it worked fine. df = sqlContext.createDataFrame([ (1.0, "Hi I heard about Spark"), (1.0, "Spark is awesome"), (0.0, None), (0.0, "And I don't know why...") ], ["label", "sentence"]) df.printSchema() df.coalesce(1).write.format("com.databricks.spark.csv")\ .option("header", "true")\ .option("nullValue"," ").save(drive+"/test.csv") Commented Feb 7, 2017 at 19:45
  • Were you able to find an answer? Commented Feb 8, 2021 at 9:39

1 Answer 1

2

I had the same problem (not using that command with the nullValue option) and I solved it by using the fillna method.

And I also realised that fillna was not working with _corrupt_record, so I dropped since I didn't need it.

df = df.drop('_corrupt_record')
df = df.fillna("")
df.write.option('header', 'true').format('csv').save('file_csv')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.