17

I saw a solution here but when I tried it doesn't work for me.

First I import a cars.csv file :

val df = sqlContext.read
              .format("com.databricks.spark.csv")
              .option("header", "true")
              .load("/usr/local/spark/cars.csv")

Which looks like the following :

+----+-----+-----+--------------------+-----+
|year| make|model|             comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla|    S|          No comment|     |
|1997| Ford| E350|Go get one now th...|     |
|2015|Chevy| Volt|                null| null|

Then I do this :

df.na.fill("e",Seq("blank"))

But the null values didn't change.

Can anyone help me ?

1
  • The statement df.na.fill("e",Seq("blank")) returns a new DataFrame so df will not be modified. Are you assigning it into a new DataFrame? Commented Oct 27, 2015 at 19:27

3 Answers 3

32

This is basically very simple. You'll need to create a new DataFrame. I'm using the DataFrame df that you have defined earlier.

val newDf = df.na.fill("e",Seq("blank"))

DataFrames are immutable structures. Each time you perform a transformation which you need to store, you'll need to affect the transformed DataFrame to a new value.

Sign up to request clarification or add additional context in comments.

6 Comments

There is no error message. The null values comes from 0/0 divide but I cannot replace of with val newDf = outputDF.na.fill("0", Seq("blank"))
@mathema do you mind asking a question describing your problem with a reproducible example ? I can't figure out what's wrong with your actual describe and I don't think that your problem can fit nicely into comments
My dataframe has also null values comes from 0/0 dividing. The type of field is a kind of string. I tried to replace null values using val newDf = outputDF.na.fill("0", Seq("blank")) and showing with newDf.show() but it don't work. Dataframe example i.imgur.com/qrWZXg8.png
This doesn't answer the question that I have asked @mathema
|
3

you can achieve same in java this way

Dataset<Row> filteredData = dataset.na().fill(0);

Comments

0

If the column was string type,

val newdf= df.na.fill("e",Seq("blank"))

would work.

Since it's float type (as the image tells) you need to use

val newdf= df.na.fill(0.0, Seq("blank"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.