0

I am pretty new to Apache Spark SQL and trying to achieve the following. I have the following DF which I want to convert to a intermediate DF and then to json.

array [a,b,c,d,e] and  array [1,2,3,4,5]

Need them to be

a 1
b 2
c 3

Tried the explode option but I get only one array exploded.

Thanks for the assistance..

2
  • maybe this answer helps Commented Apr 12, 2018 at 19:55
  • hello @sarashan did the answer below work for you? Commented Feb 28, 2019 at 17:38

1 Answer 1

2

To join two dataframes in Spark you will need to use a common column which exists on both dataframes and since you don't have one you need to create it. Since version 1.6.0 Spark supports this functionality through monotonically_increasing_id() function. The next code illustrates this case:

    import org.apache.spark.sql.functions._
    import spark.implicits._

    val df = Seq("a","b","c","d","e")
      .toDF("val1")
      .withColumn("id", monotonically_increasing_id)

    val df2 = Seq(1, 2, 3, 4, 5)
      .toDF("val2")
      .withColumn("id", monotonically_increasing_id)

    df.join(df2, "id").select($"val1", $"val2").show(false)

Output:

+----+----+
|val1|val2|
+----+----+
|a   |1   |
|b   |2   |
|c   |3   |
|d   |4   |
|e   |5   |
+----+----+

Good luck

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.