Add new column in DataFrame base on existing column

Question

I have a csv file with datetime column: "2011-05-02T04:52:09+00:00".

I am using scala, the file is loaded into spark DataFrame and I can use jodas time to parse the date:

val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val df = new SQLContext(sc).load("com.databricks.spark.csv", Map("path" -> "data.csv", "header" -> "true")) 
val d = org.joda.time.format.DateTimeFormat.forPattern("yyyy-mm-dd'T'kk:mm:ssZ")

I would like to create new columns base on datetime field for timeserie analysis.

In DataFrame, how do I create a column base on value of another column?

I notice DataFrame has following function: df.withColumn("dt",column), is there a way to create a column base on value of existing column?

Thanks

You need to create an UDF and register it. Please see stackoverflow.com/questions/29479872/… — ayan guha
– ayan guha, Commented Apr 28, 2015 at 4:26

yjshen · Accepted Answer · 2015-04-28 07:26:39Z

7

import org.apache.spark.sql.types.DateType
import org.apache.spark.sql.functions._
import org.joda.time.DateTime
import org.joda.time.format.DateTimeFormat

val d = DateTimeFormat.forPattern("yyyy-mm-dd'T'kk:mm:ssZ")
val dtFunc: (String => Date) = (arg1: String) => DateTime.parse(arg1, d).toDate
val x = df.withColumn("dt", callUDF(dtFunc, DateType, col("dt_string")))

The callUDF, col are included in functions as the import show

The dt_string inside col("dt_string") is the origin column name of your df, which you want to transform from.

Alternatively, you could replace the last statement with:

val dtFunc2 = udf(dtFunc)
val x = df.withColumn("dt", dtFunc2(col("dt_string")))

edited Apr 28, 2015 at 7:26

answered Apr 28, 2015 at 7:08

yjshen

6,7033 gold badges35 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

lilloraffa Over a year ago

Hi, thanks for the post. I'm actually doing what you suggested, but got the following error: "scala.MatchError: java.util.Date (of class scala.reflect.internal.Types$TypeRef$$anon$6) "

Collectives™ on Stack Overflow

Add new column in DataFrame base on existing column

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related