Using unix_timestamp method in creating timestamp in spark

Question

i have a csv file. It has many columns out of which two are Month and Year. Month is represented as 1...12 whereas Year 2013.. (Example). I need to create a timestamp in the format of mm/yyyy as a new column, say, 'timestamp'. I tried the below snippet but it failed.

scala> val df = spark.read.format("csv").option("header",    
"true").load("/user/bala/*.csv")
df: org.apache.spark.sql.DataFrame = [_c0: string, Month: string ... 28      
more fields]

scala> val df = spark.read.format("csv").option("header", 
"true").load("/user/bala/AWI/*.csv")
df: org.apache.spark.sql.DataFrame = [_c0: string, Month: string ... 28 
more fields]

scala> import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions.udf

scala> def makeDT(Month: String, Year: String) = s"$Month $Year"
makeDT: (Month: String, Year: String)String

scala> val makeDt = udf(makeDT(_:String,_:String))
makeDt: org.apache.spark.sql.expressions.UserDefinedFunction =    
UserDefinedFunction(<function2>,StringType,Some(List(StringType,   
StringType)))

 scala> df.select($"Month", $"Year", unix_timestamp(makeDt($"Month",   
 $"Year"), "mm/yyyy")).show(2)
  +-----+----+-----------------------------------------+
  |Month|Year|unix_timestamp(UDF(Month, Year), mm/yyyy)|
  +-----+----+-----------------------------------------+
  |    1|2013|                                     null|
  |    1|2013|                                     null|
  +-----+----+-----------------------------------------+
  only showing top 2 rows
 scala>

Can someone point out to me where I am going wrong??

KiranM · Accepted Answer · 2016-09-30 06:21:03Z

1

You need day, month & year to build timestamp. You can redefine your makeMT:

scala>def makeMT(Month: String, Year: String) = s"00/$Month/$Year 00:00:00"

Then you can use it similar to below (I didnt test it):

(unix_timestamp(makeDt($"Month", $"Year"), "dd/M/yyyy HH:mm:ss") * 1000).cast("timestamp")

answered Sep 30, 2016 at 6:21

KiranM

1,3241 gold badge11 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Using unix_timestamp method in creating timestamp in spark

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related