0

This is my Dataframe:

+--------------------+                          
|    NewsId|             newsArr|            transArr|
+----------+--------------------+--------------------+
|        26|[Republicans, Sto...|[[R, IH0, P, AH1,...|
|        29|[ISIS, Claims, Re...|[[AY1, S, AH0], [...|
|       474|[Concert, for, Tr...|[[K, AA1, N, S, E...|
|       964|[How, a, Fractiou...|[[HH, AW1], [AH0]...|
|      1677|[Review:, ‘Kong:,...|[[n/a], [n/a], [S...|
|      1697|[The, Rice-Size, ...|[[DH, AH0], [n/a]...|
|      1806|[Populists, Appea...|[[P, AA1, Y, AH0,...|
|      1950|[Uber, Board, Sta...|[[Y, UW1, B, ER0]...|
|      2040|[Health, Bill’s, ...|[[HH, EH1, L, TH]...|
|      2214|[Unmasking, the, ...|[[n/a], [DH, AH0]...|

I want to make the "transArr" column cells into strings like this:

+--------------------+                          
|    NewsId|             newsArr|      transArr|
+----------+--------------------+--------------+
|        26|[Republicans, Sto...|R IH0 P AH1...|
|        29|[ISIS, Claims, Re...|AY1 S AH0...  |
|       474|[Concert, for, Tr...|K AA1 N S E...|
|       964|[How, a, Fractiou...|HH AW1 AH0... |
|      1677|[Review:, ‘Kong:,...|n/a n/a S...  |
|      1697|[The, Rice-Size, ...|DH AH0 n/a... |
|      1806|[Populists, Appea...|P AA1 Y AH0...|
|      1950|[Uber, Board, Sta...|Y UW1 B ER0...|
|      2040|[Health, Bill’s, ...|HH EH1 L TH...|
|      2214|[Unmasking, the, ...|n/a DH AH0... |

Is there an relatively easy solution to this?

2 Answers 2

3

Use concat_ws & flatten, Check below code.

scala> df.printSchema
root
 |-- data: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: string (containsNull = true)

scala> df
.withColumn(
     "flatten",
     concat_ws(" ",flatten($"data"))
)
.show(false)

+------------+-------+
|data        |flatten|
+------------+-------+
|[[abc, cdf]]|abc cdf|
+------------+-------+
Sign up to request clarification or add additional context in comments.

2 Comments

I got the same result without flatten, try to use only concat_ws and wrap array column into col.
sure, I had posted my sample
0

using concat_ws:

import spark.implicits._
val df: DataFrame = Seq(
  ("a1", Array("2", "3", "5")),
  ("b2", Array("1", "6", "23")),
  ("b1", Array("df", "l2", "14")),
  ("c1", Array("te", "3pa", "gw"))
).toDF("key", "values")
df.show()
val newDF = df.withColumn("values", concat_ws(" ", col("values")))
newDF.show()
newDF.printSchema()

output:

+---+-------------+
|key|       values|
+---+-------------+
| a1|    [2, 3, 5]|
| b2|   [1, 6, 23]|
| b1| [df, l2, 14]|
| c1|[te, 3pa, gw]|
+---+-------------+

+---+---------+
|key|   values|
+---+---------+
| a1|    2 3 5|
| b2|   1 6 23|
| b1| df l2 14|
| c1|te 3pa gw|
+---+---------+

root
 |-- key: string (nullable = true)
 |-- values: string (nullable = false)

2 Comments

your values is of type Array[String] not Array[Array[String]]
@Srinivas Yes, you are right. I missed one pair of brackets. In that case using flatten is required.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.