Apache Spark SQL - Multiple arrays explode and 1:1 mapping

Question

I am pretty new to Apache Spark SQL and trying to achieve the following. I have the following DF which I want to convert to a intermediate DF and then to json.

array [a,b,c,d,e] and  array [1,2,3,4,5]

Need them to be

a 1
b 2
c 3

Tried the explode option but I get only one array exploded.

Thanks for the assistance..

maybe this answer helps

werner
– werner

2018-04-12 19:55:46 +00:00
Commented Apr 12, 2018 at 19:55 — werner
– werner, Commented Apr 12, 2018 at 19:55
hello @sarashan did the answer below work for you?

abiratsis
– abiratsis

2019-02-28 17:38:13 +00:00
Commented Feb 28, 2019 at 17:38 — abiratsis
– abiratsis, Commented Feb 28, 2019 at 17:38

abiratsis · Accepted Answer · 2018-04-12 20:58:48Z

2

To join two dataframes in Spark you will need to use a common column which exists on both dataframes and since you don't have one you need to create it. Since version 1.6.0 Spark supports this functionality through monotonically_increasing_id() function. The next code illustrates this case:

    import org.apache.spark.sql.functions._
    import spark.implicits._

    val df = Seq("a","b","c","d","e")
      .toDF("val1")
      .withColumn("id", monotonically_increasing_id)

    val df2 = Seq(1, 2, 3, 4, 5)
      .toDF("val2")
      .withColumn("id", monotonically_increasing_id)

    df.join(df2, "id").select($"val1", $"val2").show(false)

Output:

+----+----+
|val1|val2|
+----+----+
|a   |1   |
|b   |2   |
|c   |3   |
|d   |4   |
|e   |5   |
+----+----+

Good luck

answered Apr 12, 2018 at 20:58

abiratsis

7,3414 gold badges31 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Apache Spark SQL - Multiple arrays explode and 1:1 mapping

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related