11

I have a spark dataframe looks like:

id   DataArray
a    array(3,2,1)
b    array(4,2,1)     
c    array(8,6,1)
d    array(8,2,4)

I want to transform this dataframe into:

id  col1  col2  col3
a    3     2     1
b    4     2     1
c    8     6     1 
d    8     2     4

What function should I use?

2 Answers 2

22

Use apply:

import org.apache.spark.sql.functions.col

df.select(
  col("id") +: (0 until 3).map(i => col("DataArray")(i).alias(s"col$i")): _*
)
Sign up to request clarification or add additional context in comments.

1 Comment

Am not able to resolve import org.apache.spark.sql.col , may i know which version of spark are you using. Do we need any additional packages ? <scala> import org.apache.spark.sql.col <console>:23: error: object col is not a member of package org.apache.spark.sql
4

You can use foldLeft to add each columnn fron DataArray

make a list of column names that you want to add

val columns = List("col1", "col2", "col3")

columns.zipWithIndex.foldLeft(df) {
  (memodDF, column) => {
    memodDF.withColumn(column._1, col("dataArray")(column._2))
  }
}
  .drop("DataArray")

Hope this helps!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.