Replace null with empty string when writing Spark dataframe

Question

Is there a way to replace null values in a column with empty string when writing spark dataframe to file?

Sample data:

+----------------+------------------+
|   UNIQUE_MEM_ID|              DATE|
+----------------+------------------+
|            1156|              null|
|            3787|        2016-07-05|
|            1156|              null|
|            5064|              null|
|            5832|              null|
|            3787|              null|
|            5506|              null|
|            7538|              null|
|            7436|              null|
|            5091|              null|
|            8673|              null|
|            2631|              null|
|            8561|              null|
|            3516|              null|
|            1156|              null|
|            5832|              null|
|            2631|        2016-07-07|

I think @shu answer will be quicker than mine. you can crosscheck.. — kites
– kites, Commented Jul 29, 2020 at 18:11

kites · Accepted Answer · 2020-07-29 18:16:24Z

8

check this out. you can when and otherwise.

    df.show()

    #InputDF
    # +-------------+----------+
    # |UNIQUE_MEM_ID|      DATE|
    # +-------------+----------+
    # |         1156|      null|
    # |         3787|2016-07-05|
    # |         1156|      null|
    # +-------------+----------+


    df.withColumn("DATE", F.when(F.col("DATE").isNull(), '').otherwise(F.col("DATE"))).show()

    #OUTPUTDF
    # +-------------+----------+
    # |UNIQUE_MEM_ID|      DATE|
    # +-------------+----------+
    # |         1156|          |
    # |         3787|2016-07-05|
    # |         1156|          |
    # +-------------+----------+

To apply the above logic to all the columns of dataframe. you can use for loop and iterate through columns and fill empty string when column value is null.

 df.select( *[ F.when(F.col(column).isNull(),'').otherwise(F.col(column)).alias(column) for column in df.columns]).show()

edited Jul 29, 2020 at 18:16

answered Jul 29, 2020 at 17:57

kites

1,40510 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Shibu Over a year ago

This works. but can we scale it on entire dtafarame without specifying each individual cols

wawawa Over a year ago

hi, what is F?

kites Over a year ago

@wawawa, you can import pyspark sql functions as below and alias with F from pyspark.sql import functions as F

notNull · Accepted Answer · 2022-01-19 04:35:23Z

5

Use either .na.fill(),fillna() functions for this case.

If you have all string columns then df.na.fill('') will replace all null with '' on all columns.
For int columns df.na.fill('').na.fill(0) replace null with 0
Another way would be creating a dict for the columns and replacement value df.fillna({'col1':'replacement_value',...,'col(n)':'replacement_value(n)'})

Example:

df.show()
#+-------------+----------+
#|UNIQUE_MEM_ID|      DATE|
#+-------------+----------+
#|         1156|      null|
#|         3787|      null|
#|         2631|2016007-07|
#+-------------+----------+
from pyspark.sql.functions import *

df.na.fill('').show()
df.fillna({'DATE':''}).show()
#+-------------+----------+
#|UNIQUE_MEM_ID|      DATE|
#+-------------+----------+
#|         1156|          |
#|         3787|          |
#|         2631|2016007-07|
#+-------------+----------+

edited Jan 19, 2022 at 4:35

answered Jul 29, 2020 at 18:09

notNull

31.8k4 gold badges41 silver badges58 bronze badges

2 Comments

Shibu Over a year ago

Same question @Shu, how can scale this on all df columns

notNull Over a year ago

@ben, if you have all string columns then df.na.fill('') will replace all null with '' on all columns, for int columns df.na.fill('').na.fill(0) replace null with 0.

Collectives™ on Stack Overflow

Replace null with empty string when writing Spark dataframe

2 Answers 2

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related