3

Assuming that I have the following data

+--------------------+-----+--------------------+
|              values|count|             values2|
+--------------------+-----+--------------------+
|              aaaaaa|  249|                null|
|              bbbbbb|  166|                  b2|
|              cccccc| 1680|           something|
+--------------------+-----+--------------------+

So if there is a null value in values2 column how to assign the values1 column to it? So the result should be:

+--------------------+-----+--------------------+
|              values|count|             values2|
+--------------------+-----+--------------------+
|              aaaaaa|  249|              aaaaaa|
|              bbbbbb|  166|                  b2|
|              cccccc| 1680|           something|
+--------------------+-----+--------------------+

I thought of something of the following but it doesnt work:

df.na.fill({"values2":df['values']}).show()

I found this way to solve it but there should be something more clear forward:

def change_null_values(a,b):
    if b:
        return b
    else:
        return a

udf_change_null = udf(change_null_values,StringType())

df.withColumn("values2",udf_change_null("values","values2")).show()
2

3 Answers 3

5

You can use https://spark.apache.org/docs/1.6.2/api/python/pyspark.sql.html#pyspark.sql.functions.coalesce

df.withColumn('values2', coalesce(df.values2, df.values)).show()
Sign up to request clarification or add additional context in comments.

1 Comment

Can we ingest our own value like 0 or something ?
4

Following up on @shadow_dev's method :

    df.withColumn("values2", 
                  when(col("values2").isNull(), col("values1"))
                  .otherwise(col("values2")))

Dmytro Popovych's solution is still the cleanest.

If one needs more fancy when/otherwise logic :

df.withColumn("values2", when(col("values2").isNull() | col("values3").isNull(), col("values1"))
.when(col("values1") == col("values2"), 1)
.otherwise(0))

Comments

-3

You can use the column attribute .isNull().

df.where(col("dt_mvmt").isNull())

df.where(col("dt_mvmt").isNotNull())

This answer comes from this answer - I just don't have enough reputation to add a comment.

1 Comment

Doesn't provide complete solution to the issue. Only hints towards the solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.