1

I have a dataframe like this :

userId    someString      varA     varB
   1      "example1"    0,2,5     1,2,9
   2      "example2"    1,20,5   9,null,6

i want to convert the data into VarA and varB to an array of String

userId    someString      varA     varB
   1      "example1"    [0,2,5]   [1,2,9]
   2      "example2"    [1,20,5]  [9,null,6]
1
  • 2
    If varA and varB is a string, then simply do str.split(",") Commented Feb 15, 2019 at 9:34

1 Answer 1

3

Its fairly Simple. you can use sql split function.

 import org.apache.spark.sql.functions.split
df.withColumn("varA", split($"varA",",")).withColumn("varB", split($"varB",",")).show()

Output

+------+----------+----------+------------+
|userId|someString|      varA|        varB|
+------+----------+----------+------------+
|     1|  example1| [0, 2, 5]|   [1, 2, 9]|
|     2|  example2|[1, 20, 5]|[9, null, 6]|
+------+----------+----------+------------+
Sign up to request clarification or add additional context in comments.

1 Comment

@RadhwenKHADHRI Happy to help :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.