-1

I am writing a csv file onto datalake from a dataframe which has null values. Spark sql explicitly puts the value as Null for null values. I want to replace these null values with no values or no other strings.

When i write the csv file from databricks, it looks like this

ColA,ColB,ColC 
null,ABC,123     
ffgg,DEF,345    
null,XYZ,789

I tried replacing nulls with '' using fill.na, but when I do that, the file gets written like this

ColA,ColB,ColC    
'',ABC,123     
ffgg,DEF,345    
'',XYZ,789

And I want my csv file to look like this. How do I achieve this from spark sql. I am using databricks. Any help in this regard is highly appreciated.

ColA,ColB,ColC    
,ABC,123     
ffg,DEF,345    
,XYZ,789

Thanks!

2

1 Answer 1

0

I think we need to use .saveAsTextFile for this case instead of csv.

Example:

df.show()
//+----+----+----+
//|col1|col2|col3|
//+----+----+----+
//|null| ABC| 123|
//|  dd| ABC| 123|
//+----+----+----+

//extract header from dataframe
val header=spark.sparkContext.parallelize(Seq(df.columns.mkString(",")))

//union header with data and replace [|]|null then save
header.union(df.rdd.map(x => x.toString)).map(x => x.replaceAll("[\\[|\\]|null]","")).coalesce(1).saveAsTextFile("<path>")

//content of file
//co1,co2,co3
//,ABC,123
//dd,ABC,123

If First field in your data is not null then you can use csv option:

 df.write.option("nullValue", null).mode("overwrite").csv("<path>")
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks much. This helped a lot

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.