1

I got a dataframe with (String, List[String]). I want to split de List[String] and put each value from the list in a field. For example:

String 1, [1, 2, 3, 4]    =>   String 1, 1, 2, 3, 4

Input (String, List[String]):

Hey, [wooa, mmmm, ehhh]
Hey1, [woooe, rrrr, ough, shhhhh]

Output (String, String, String, String,..., String)

Hey, wooa, mmmm, ehhh
Hey1, woooe, rrrr, ough, shhhhh

I am trying with the next code

df.withColumn("temp",split(col("fieldList"), ","))
  .select(col("*") +: (0 until 9).map(i => col("temp").getItem(i).as(s"col$i")):_*)

My problem is when I execute that, I get an error like:

User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve 'split(fieldList, ',')' due to data type mismatch: argument 1 requires string type, however, 'fieldList' is of array type.;;

Any idea how to convert the List to String? I have tried to use .mkString() but I am missing something

Thanks

3
  • 1
    It would be cool if you provide the expected input and output Commented May 8, 2020 at 8:27
  • why you need to perform split if you already have list of strings? Commented May 8, 2020 at 9:16
  • I need to take each element of the list and put in it own column. Commented May 8, 2020 at 9:28

1 Answer 1

1

Check below code.

scala> val df = Seq(("Hey",Seq("wooa","mmmm","ehhh")),("Hey1",Seq("woooe", "rrrr", "ough", "shhhhh"))).toDF("aa","bb")
df: org.apache.spark.sql.DataFrame = [aa: string, bb: array<string>]

scala> val max = df.withColumn("length",size($"bb")).orderBy($"length".desc).select($"length").head.getAs[Int](0)
max: Int = 4

scala> df.withColumn("length",size($"bb")).orderBy($"length".desc).select(col("*") +: (0 until max).map(i => col("bb")(i).as(s"col$i")):_*).show(false)
+----+---------------------------+------+-----+----+----+------+
|aa  |bb                         |length|col0 |col1|col2|col3  |
+----+---------------------------+------+-----+----+----+------+
|Hey1|[woooe, rrrr, ough, shhhhh]|4     |woooe|rrrr|ough|shhhhh|
|Hey |[wooa, mmmm, ehhh]         |3     |wooa |mmmm|ehhh|null  |
+----+---------------------------+------+-----+----+----+------+

scala> df.withColumn("length",size($"bb")).orderBy($"length".desc).select(col("*") +: (0 until max).map(i => col("bb")(i).as(s"col$i")):_*).drop("length","bb")show(false)
+----+-----+----+----+------+
|aa  |col0 |col1|col2|col3  |
+----+-----+----+----+------+
|Hey1|woooe|rrrr|ough|shhhhh|
|Hey |wooa |mmmm|ehhh|null  |
+----+-----+----+----+------+

Sign up to request clarification or add additional context in comments.

2 Comments

Hi @Srinivas, one last question. How to save this result in a dataframe? I tried val dataframe = df.withColumn("length",size($"bb")).orderBy($"length".desc).select(col("") +: (0 until max).map(i => col("bb")(i).as(s"col$i")):_).drop("length","bb")show(false)" but I cannot work with as dataframe
remove last show(false) , it will save result to DF.. :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.