4

I have a Dataframe with a Column of Array Type For example :

val df = List(("a", Array(1d,2d,3d)), ("b", Array(4d,5d,6d))).toDF("ID", "DATA")
df: org.apache.spark.sql.DataFrame = [ID: string, DATA: array<double>]

scala> df.show
+---+---------------+
| ID|           DATA|
+---+---------------+
|  a|[1.0, 2.0, 3.0]|
|  b|[4.0, 5.0, 6.0]|
+---+---------------+

I wish to explode the array and have index like

+---+------------------+
| ID|  DATA_INDEX| DATA|
+---+------------------+
|  a|1           | 1.0 |
|  a|2           | 2.0 |
|  a|3           | 3.0 |
|  b|1           | 4.0 |
|  b|2           | 5.0 |
|  b|3           | 6.0 |
+---+------------+-----+

I wish be able to do that with scala, and Sparlyr or SparkR I'm using spark 1.6

0

2 Answers 2

8

There is a posexplode function available in spark functions

   import org.apache.spark.sql.functions._

   df.select("ID", posexplode($"DATA))

PS: This is only available after 2.1.0 versions

Sign up to request clarification or add additional context in comments.

Comments

4

With Spark 1.6, you can register you dataframe as a temporary table and then run Hive QL over it to get the desired result.

df.registerTempTable("tab")

sqlContext.sql("""
    select 
       ID, exploded.DATA_INDEX + 1 as DATA_INDEX, exploded.DATA  
    from 
       tab 
    lateral view posexplode(tab.DATA) exploded as DATA_INDEX, DATA 
""").show

+---+----------+----+
| ID|DATA_INDEX|DATA|
+---+----------+----+
|  a|         1| 1.0|
|  a|         2| 2.0|
|  a|         3| 3.0|
|  b|         1| 4.0|
|  b|         2| 5.0|
|  b|         3| 6.0|
+---+----------+----+

1 Comment

Does not work in spark 1.6.3: "failure: ``union'' expected but identifier view found: lateral view posexplode(intakeDf.sentences) exploded as sentId, sent"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.