0

I have a data frame :

+--------------------------------------+------------------------------------------------------------+
|item                                  |item_codes                                               |
+--------------------------------------+------------------------------------------------------------+
|loose fit long sleeve swim shirt women|["2237741011","1046622","1040660","7147440011","7141123011"]|
+--------------------------------------+------------------------------------------------------------+

And schema looks like this =

root
 |-- item: string (nullable = true)
 |-- item_codes: string (nullable = true)

How can I convert the column item_codes string to Array[String] in Scala ?

2 Answers 2

1

You can remove quotes/square brackets using regexp_replace, followed by a split to generate the ArrayType column:

val df = Seq(
  ("abc", "[\"2237741011\",\"1046622\",\"1040660\",\"7147440011\",\"7141123011\"]")
).toDF("item", "item_codes")

df.
  withColumn("item_codes", split(regexp_replace($"item_codes", """\[?\"\]?""", ""), "\\,")).
  show(false)
// +----+------------------------------------------------------+
// |item|item_codes                                            |
// +----+------------------------------------------------------+
// |abc |[2237741011, 1046622, 1040660, 7147440011, 7141123011]|
// +----+------------------------------------------------------+
Sign up to request clarification or add additional context in comments.

Comments

0

You can use the split method after doing some "preprocessing"

val col_names = Seq("item", "item_codes")

val data = Seq(("loose fit long sleeve swim shirt women", """["2237741011","1046622","1040660","7147440011","7141123011"]"""))

val df = spark.createDataFrame(data).toDF(col_names: _*)

// chop off first 2 and last 2 character and split at ","
df.withColumn("item_codes", split(expr("substring(item_codes, 3, length(item_codes)-4)"), """","""")).printSchema

If your format can change you might be more flexible using a regexp as leo suggestes chopping off everything that is not a digit or a , and split at ,

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.