How to conditionally replace Spark SQL array values using SQL language?

Question

I have this column inside myTable:

myColumn
[red, green]
[green, green, red]

I need to modify it so that I can replace red with 1, green with 2:

myColumn
[1, 2]
[2, 2, 1]

In short, is there a way to apply case clause for each element in the array, row wise?

The closest I've gotten so far:

select replace(replace(to_json(myColumn), 'red', 1), 'green', 2)

On the other hand, in case we have a column of strings, I could simply use:

select (
  case
    when myColumn='red' then 1
    when myColumn='green' then 2
  end
) from myTable;

过过招 · Accepted Answer · 2021-11-02 09:10:31Z

1

Assuming that the dataframe has registered a temporary view named tmp, use the following SQL statement to get the result.

sql = """
    select
        collect_list(
            case col
                when 'red' then 1
                when 'green' then 2
            end)
        myColumn
    from
        (select mid,explode(myColumn) col
        from
            (select monotonically_increasing_id() mid,myColumn
            from tmp)
        )
    group by mid
"""
df = spark.sql(sql)
df.show(truncate=False)

answered Nov 2, 2021 at 9:10

过过招

4,3372 gold badges7 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Oli Over a year ago

This would not preserve the order I am afraid.

Oli · Accepted Answer · 2021-11-02 10:16:35Z

1

In pure Spark SQL, you could convert your array into a string with concat_ws, make the substitutions with regexp_replace and then recreate the array with split.

select split(
    regexp_replace(
        regexp_replace(
            concat_ws(',', myColumn)
        , 'red', '1')
    , 'green', '2')
, ',') myColumn from df

answered Nov 2, 2021 at 10:16

Oli

10.5k5 gold badges31 silver badges51 bronze badges

Comments

Amal Rajan · Accepted Answer · 2021-11-02 17:04:49Z

1

I could perform a simple transform (Spark 3 onwards)

select transform(myColumn, value ->
  case value
    when 'red' then 1
    when 'green' then 2
  end
from myTable

answered Nov 2, 2021 at 17:04

Amal Rajan

1532 silver badges13 bronze badges

Comments

Oli · Accepted Answer · 2021-11-02 09:40:56Z

0

Let's create some sample data and a map that contains the substitutions: tou want to make

val df = Seq((1, Seq("red", "green")),
             (2, Seq("green", "green", "red")))
         .toDF("id", "myColumn")
val values = Map("red" -> "1", "green" -> "2")

The most straight forward way would be to define a UDF that does exactly what you want:

val replace = udf((x : Array[String]) =>
    x.map(value => values.getOrElse(value, value)))
df.withColumn("myColumn", replace('myColumn)).show

+---+---------+
| id| myColumn|
+---+---------+
|  1|   [1, 2]|
|  2|[2, 2, 1]|
+---+---------+

Without UDFs, you could transform the array into a string with concat_ws using separators that are not in your array. Then we could use string functions to make the edits:

val sep = ","
val replace = values
    .foldLeft(col("myColumn")){ case (column, (key, value)) =>
        regexp_replace(column, sep + key + sep, sep + value + sep) 
    }
df.withColumn("myColumn", concat(lit(sep), concat_ws(sep+sep, 'myColumn), lit(sep)))
  .withColumn("myColumn", regexp_replace(replace, "(^,)|(,$)", ""))
  .withColumn("myColumn", split('myColumn, sep+sep))
  .show

edited Nov 2, 2021 at 9:40

answered Nov 2, 2021 at 8:58

Oli

10.5k5 gold badges31 silver badges51 bronze badges

4 Comments

Amal Rajan Over a year ago

Is it possible to achieve the same purely by using Spark SQL?

Oli Over a year ago

By sparkSQL, you mean the sparkSQL API or the actual SQL language?

Amal Rajan Over a year ago

The actual SQL language, my bad for being vague

Oli Over a year ago

No problem. I leave this here because it might help others. It has the advantage of being configurable solely with the values map. I added another answer with a similar approach in SQL.

Collectives™ on Stack Overflow

How to conditionally replace Spark SQL array values using SQL language?

4 Answers 4

1 Comment

Comments

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related