I have that format of date Tue Dec 31 07:14:22 +0000 2013 in string and I need to convert it to Date object where the timestamp field is to be indexed in scala spark
-
if you could add some sample data that would have been great.Nikunj Kakadiya– Nikunj Kakadiya2021-11-09 09:49:10 +00:00Commented Nov 9, 2021 at 9:49
-
1Please don't make more work for others by vandalizing your posts. By posting on the Stack Exchange (SE) network, you've granted a non-revocable right, under a CC BY-SA license, for SE to distribute the content (i.e. regardless of your future choices). By SE policy, the non-vandalized version is distributed. Thus, any vandalism will be reverted. Please see: How does deleting work? …. If permitted to delete, there's a "delete" button below the post, on the left, but it's only in browsers, not the mobile app.Makyen– Makyen ♦2021-11-10 23:40:19 +00:00Commented Nov 10, 2021 at 23:40
Add a comment
|
2 Answers
You can do this by splitting the string column by space and converting that column to array type and then creating a new string column with any of the supported date format.
//Creating sample data
import org.apache.spark.sql.functions._
val df = Seq(("Tue Dec 31 07:14:22 +0000 2013"),("Thu Dec 09 09:14:42 +0000 2017")).toDF("DateString")
//creating new column of type array from string column
val df1 = df.withColumn("DateArray", split($"DateString", " "))
//Getting the required elements from the array column and combining them to get the date
val df2 = df1.withColumn("DateTime" , concat($"DateArray".getItem(5), lit("-"), $"DateArray".getItem(1), lit("-"),$"DateArray".getItem(2))).withColumn("Date",to_date($"DateTime","yyyy-MMM-dd"))
//Using display to show the content of the dataframe. you can also use .show method.
display(df2)
You can drop the columns that you don't require as per your output requirement.
