0

I am using spark 2.0 and have a use case where I need to convert the attribute type of a column from string to Array[long].

Suppose I have a dataframe with schema :

root
 |-- unique_id: string (nullable = true)
 |-- column2 : string (nullable = true)

DF :

+----------+---------+
|unique_id | column2 |
+----------+---------+
|  1       |  123    |
|  2       |  125    |
+----------+---------+

now i want to add a new column with name "column3" of type Array[long]having the values from "column2" like :

root
 |-- unique_id: string (nullable = true)
 |-- column2: long (nullable = true)
 |-- column3: array (nullable = true)
 |    |-- element: long (containsNull = true)

new DF :

+----------+---------+---------+
|unique_id | column2 | column3 |
+----------+---------+---------+
|  1       |  123    | [123]   | 
|  2       |  125    | [125]   |
+----------+---------+---------+

I there a way to achieve this ?

1 Answer 1

2

You can simply use withColumn and array function as

df.withColumn("column3", array(df("columnd")))

And I also see that you are trying to change the column2 from string to Long. A simple udf function should do the trick. So final solution would be

def changeToLong = udf((str: String) => str.toLong)


val finalDF = df
  .withColumn("column2", changeToLong(col("column2")))
  .withColumn("column3", array(col("column2")))

You need to import functions library too as

import org.apache.spark.sql.functions._
Sign up to request clarification or add additional context in comments.

3 Comments

hi @Ramesh , did you check this on df ? I got this error "error: not found: value array"
you need to import org.apache.spark.sql.functions._
great to hear that @Ram, and thanks for accepting and upvotes :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.