0

I am trying to replace all strings in a column that start with 'DEL_' with a NULL value.

I have tried this:

customer_details = customer_details.withColumn("phone_number", F.regexp_replace("phone_number", "DEL_.*", ""))

Which works as expected and the new column now looks like this:

+--------------+
|  phone_number|
+--------------+
|00971585059437|
|00971559274811|
|00971559274811|
|              |
|00918472847271|
|              |
+--------------+

However, if I change the code to:

customer_details = customer_details.withColumn("phone_number", F.regexp_replace("phone_number", "DEL_.*", None))

This now replaces all values in the column:

+------------+
|phone_number|
+------------+
|        null|
|        null|
|        null|
|        null|
|        null|
|        null|
+------------+
1
  • Regex replacement is only possible with string data only. null is not a string type. Commented Jul 23, 2020 at 10:59

1 Answer 1

3

Try this-

scala

df.withColumn("phone_number", when(col("phone_number").rlike("^DEL_.*"), null)
          .otherwise(col("phone_number"))
      )

python

df.withColumn("phone_number", when(col("phone_number").rlike("^DEL_.*"), None)
          .otherwise(col("phone_number"))
      )

Update

Query-

Can you explain why my original solution doesn't work? customer_details.withColumn("phone_number", F.regexp_replace("phone_number", "DEL_.*", None))

Ans- All the ternary expressions(functions taking 3 arguments) are all null-safe. That means if spark finds any of the arguments null, it will indeed return null without any actual processing (eg. pattern matching for regexp_replace). you may wanted to look at this piece of spark repo

  override def eval(input: InternalRow): Any = {
    val exprs = children
    val value1 = exprs(0).eval(input)
    if (value1 != null) {
      val value2 = exprs(1).eval(input)
      if (value2 != null) {
        val value3 = exprs(2).eval(input)
        if (value3 != null) {
          return nullSafeEval(value1, value2, value3)
        }
      }
    }
    null
  }
Sign up to request clarification or add additional context in comments.

1 Comment

This works, thanks. Can you explain why my original solution doesn't work?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.