I have the dataframe as follows :
ColA ColB ColC
1 [2,3,4] [5,6,7]
I need to convert it to the below
ColA ColB ColC
1 2 5
1 3 6
1 4 7
Can someone please help with the Code in SCALA?
You can zip the two array columns by means of a UDF and explode the zipped column as follows:
val df = Seq(
(1, Seq(2, 3, 4), Seq(5, 6, 7))
).toDF("ColA", "ColB", "ColC")
def zip = udf(
(x: Seq[Int], y: Seq[Int]) => x zip y
)
val df2 = df.select($"ColA", zip($"ColB", $"ColC").as("BzipC")).
withColumn("BzipC", explode($"BzipC"))
val df3 = df2.select($"ColA", $"BzipC._1".as("ColB"), $"BzipC._2".as("ColC"))
df3.show
+----+----+----+
|ColA|ColB|ColC|
+----+----+----+
| 1| 2| 5|
| 1| 3| 6|
| 1| 4| 7|
+----+----+----+
The idea I am presenting here is a bit complex which requires you to use map to combine the two arrays of ColB and ColC. Then use the explode function to explode the combined array. and finally extract the exploded combined array to different columns.
import org.apache.spark.sql.functions._
val tempDF = df.map(row => {
val colB = row(1).asInstanceOf[mutable.WrappedArray[Int]]
val colC = row(2).asInstanceOf[mutable.WrappedArray[Int]]
var array = Array.empty[(Int, Int)]
for(loop <- 0 to colB.size-1){
array = array :+ (colB(loop), colC(loop))
}
(row(0).asInstanceOf[Int], array)
})
.toDF("ColA", "ColB")
.withColumn("ColD", explode($"ColB"))
tempDF.withColumn("ColB", $"ColD._1").withColumn("ColC", $"ColD._2").drop("ColD").show(false)
this would give you result as
+----+----+----+
|ColA|ColB|ColC|
+----+----+----+
|1 |2 |5 |
|1 |3 |6 |
|1 |4 |7 |
+----+----+----+
You can also use a combination of posexplode and lateral view from HiveQL
sqlContext.sql("""
select 1 as colA, array(2,3,4) as colB, array(5,6,7) as colC
""").registerTempTable("test")
sqlContext.sql("""
select
colA , b as colB, c as colC
from
test
lateral view
posexplode(colB) columnB as seqB, b
lateral view
posexplode(colC) columnC as seqC, c
where
seqB = seqC
""" ).show
+----+----+----+
|colA|colB|colC|
+----+----+----+
| 1| 2| 5|
| 1| 3| 6|
| 1| 4| 7|
+----+----+----+
Credits: https://stackoverflow.com/a/40614822/7224597 ;)