Replace null values in Spark DataFrame

Question

I saw a solution here but when I tried it doesn't work for me.

First I import a cars.csv file :

val df = sqlContext.read
              .format("com.databricks.spark.csv")
              .option("header", "true")
              .load("/usr/local/spark/cars.csv")

Which looks like the following :

+----+-----+-----+--------------------+-----+
|year| make|model|             comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla|    S|          No comment|     |
|1997| Ford| E350|Go get one now th...|     |
|2015|Chevy| Volt|                null| null|

Then I do this :

df.na.fill("e",Seq("blank"))

But the null values didn't change.

Can anyone help me ?

The statement df.na.fill("e",Seq("blank")) returns a new DataFrame so df will not be modified. Are you assigning it into a new DataFrame? — Rohan Aletty
– Rohan Aletty, Commented Oct 27, 2015 at 19:27

eliasah · Accepted Answer · 2017-03-07 14:36:23Z

32

This is basically very simple. You'll need to create a new DataFrame. I'm using the DataFrame df that you have defined earlier.

val newDf = df.na.fill("e",Seq("blank"))

DataFrames are immutable structures. Each time you perform a transformation which you need to store, you'll need to affect the transformed DataFrame to a new value.

edited Mar 7, 2017 at 14:36

answered Oct 27, 2015 at 20:18

eliasah

40.5k12 gold badges128 silver badges159 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

mathema Over a year ago

There is no error message. The null values comes from 0/0 divide but I cannot replace of with val newDf = outputDF.na.fill("0", Seq("blank"))

eliasah Over a year ago

@mathema do you mind asking a question describing your problem with a reproducible example ? I can't figure out what's wrong with your actual describe and I don't think that your problem can fit nicely into comments

mathema Over a year ago

My dataframe has also null values comes from 0/0 dividing. The type of field is a kind of string. I tried to replace null values using val newDf = outputDF.na.fill("0", Seq("blank")) and showing with newDf.show() but it don't work. Dataframe example i.imgur.com/qrWZXg8.png

eliasah Over a year ago

This doesn't answer the question that I have asked @mathema

eliasah Over a year ago

Let us continue this discussion in chat.

|

Bhagwati Malav · Accepted Answer · 2017-05-13 13:39:15Z

3

you can achieve same in java this way

Dataset<Row> filteredData = dataset.na().fill(0);

answered May 13, 2017 at 13:39

Bhagwati Malav

3,5592 gold badges22 silver badges35 bronze badges

Comments

Y. Yazarel · Accepted Answer · 2020-08-16 07:33:38Z

0

If the column was string type,

val newdf= df.na.fill("e",Seq("blank"))

would work.

Since it's float type (as the image tells) you need to use

val newdf= df.na.fill(0.0, Seq("blank"))

answered Aug 16, 2020 at 7:33

Y. Yazarel

1,5211 gold badge11 silver badges13 bronze badges

Collectives™ on Stack Overflow

Replace null values in Spark DataFrame

3 Answers 3

6 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related