Replace Null values with no value in spark sql

Question

I am writing a csv file onto datalake from a dataframe which has null values. Spark sql explicitly puts the value as Null for null values. I want to replace these null values with no values or no other strings.

When i write the csv file from databricks, it looks like this

ColA,ColB,ColC 
null,ABC,123     
ffgg,DEF,345    
null,XYZ,789

I tried replacing nulls with '' using fill.na, but when I do that, the file gets written like this

ColA,ColB,ColC    
'',ABC,123     
ffgg,DEF,345    
'',XYZ,789

And I want my csv file to look like this. How do I achieve this from spark sql. I am using databricks. Any help in this regard is highly appreciated.

ColA,ColB,ColC    
,ABC,123     
ffg,DEF,345    
,XYZ,789

Thanks!

Does this answer your question? Writing CSV file using Spark and scala - empty quotes instead of Null values — Shubhanshu
– Shubhanshu, Commented Mar 13, 2020 at 10:28
option("nullValue", null) won't work if your first field is null still keeps as "" — notNull
– notNull, Commented Mar 13, 2020 at 14:07

notNull · Accepted Answer · 2020-03-13 23:18:23Z

0

I think we need to use .saveAsTextFile for this case instead of csv.

Example:

df.show()
//+----+----+----+
//|col1|col2|col3|
//+----+----+----+
//|null| ABC| 123|
//|  dd| ABC| 123|
//+----+----+----+

//extract header from dataframe
val header=spark.sparkContext.parallelize(Seq(df.columns.mkString(",")))

//union header with data and replace [|]|null then save
header.union(df.rdd.map(x => x.toString)).map(x => x.replaceAll("[\\[|\\]|null]","")).coalesce(1).saveAsTextFile("<path>")

//content of file
//co1,co2,co3
//,ABC,123
//dd,ABC,123

If First field in your data is not null then you can use csv option:

 df.write.option("nullValue", null).mode("overwrite").csv("<path>")

edited Mar 13, 2020 at 23:18

answered Mar 12, 2020 at 22:59

notNull

31.8k4 gold badges41 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Sujatha Over a year ago

Thanks much. This helped a lot

Collectives™ on Stack Overflow

Replace Null values with no value in spark sql

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related