1

PostgreSQL's ARRAY_TO_STRING() function allows you to run

SELECT array_to_string(ARRAY[1, 2, 3, NULL, 5], ',', '*');

and gives you

array_to_string

-----------------
1,2,3,*,5
(1 row)

Can we do the same using Spark SQL?

What I really need is to have a JSON structure to stay as a string. Thanks!

1 Answer 1

2

The closest thing you can get without writing an udf is concat_ws:

from pyspark.sql import functions as F

rdd = sc.parallelize(["""{"foo": 1.0, "bar": [1, 2, 3, null, 5]}"""])

spark.read.json(rdd).withColumn("bar", F.concat_ws(",", "bar")).show()
# +-------+---+
# |    bar|foo|
# +-------+---+
# |1,2,3,5|1.0|
# +-------+---+  

but as you see it ignores nulls. With udf you can

@F.udf
def array_to_string(xs, sep, nafill):
    return sep.join(str(x) if x is not None else str(nafill) for x in xs)

spark.read.json(rdd).withColumn("bar", array_to_string("bar", F.lit(","), F.lit("*"))).show()
# +---------+---+
# |      bar|foo|
# +---------+---+
# |1,2,3,*,5|1.0|
# +---------+---+

but if:

What I really need is to have a JSON structure to stay as a string

then don't parse it all. For example if you use JSON reader:

from pyspark.sql.types import *

(spark.read
    .schema(StructType([StructField("foo", StringType()), StructField("bar", StringType())]))
    .json(rdd)
    .show())
# +---+--------------+
# |foo|           bar|
# +---+--------------+
# |1.0|[1,2,3,null,5]|
# +---+--------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.