How to write a dataframe in pyspark having null values to CSV

Question

I'm using the below code to write to a CSV file.

df.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").option("nullValue"," ").save("/home/user/test_table/")

when I execute it, I'm getting the following error:

java.lang.UnsupportedOperationException: CSV data source does not support null data type.

Could anyone please help?

Could you please update question attaching df.printSchema() result? — Mariusz
– Mariusz, Commented Feb 7, 2017 at 13:20
how are your null values stored? When I used Python's None type as null object and did a save, it worked fine. df = sqlContext.createDataFrame([ (1.0, "Hi I heard about Spark"), (1.0, "Spark is awesome"), (0.0, None), (0.0, "And I don't know why...") ], ["label", "sentence"]) df.printSchema() df.coalesce(1).write.format("com.databricks.spark.csv")\ .option("header", "true")\ .option("nullValue"," ").save(drive+"/test.csv") — data_steve
– data_steve, Commented Feb 7, 2017 at 19:45

Carlos Villacreces · Accepted Answer · 2021-06-18 13:39:07Z

2

I had the same problem (not using that command with the nullValue option) and I solved it by using the fillna method.

And I also realised that fillna was not working with _corrupt_record, so I dropped since I didn't need it.

df = df.drop('_corrupt_record')
df = df.fillna("")
df.write.option('header', 'true').format('csv').save('file_csv')

answered Jun 18, 2021 at 13:39

1712 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

1 Answer 1