1

I need to extract timestamp from the value column

i tried doing getItem however does not return anything

val data = df.withColumn("splitted", split($"value", "/"))
      .select($"splitted".getItem(6).alias("region"), $"splitted".getItem(7).alias("service"), col("value"))
      .withColumn("service_type", regexp_extract($"service", """.*(Inbound|Outbound|Outound).*""", 1))
      .withColumn("region_type", concat(
        when(col("region").isNotNull, col("region")).otherwise(lit("null")), lit(" "),
        when(col("service").isNotNull, col("service_type")).otherwise(lit("null"))))
      .withColumn("splitt", split($"value", "\t")
      .select($"splitt".getItem(1).alias("datetime"))

I need to extract timestamp with new column "datetime" 2019-05-14 04:02:03 from below string;

{"value":"2019-05-14T09:02:06.486Z index:: host:: 2019-05-14 04:02:03,307 INFO  - \tTue May 14 04:02:03 CDT 2019\tID:<490744.1557824523305.0>\tsv\tAFTER_LOOKUP_QUERY_PARTNER_CHANNEL\t[messageData(DispatchID: 06708235871 Region: EMEA SubRegion: EU OperationType: <OperationType>STATUSUPDATE</OperationType> Operation: StatusUpdate)]\tms \t"}

1 Answer 1

1

You can use regex_extract function to extract only timestamp from a string as below

df.withColumn("dateTime", 
      regexp_extract($"value", """\d{4}-[01]\d-[0-3]\d [0-2]\d:[0-5]\d:[0-5]\d""", 0)
).show(false)

Output:

+-------------------+
|dateTime           |
+-------------------+
|2019-05-14 04:02:03|
+-------------------+
Sign up to request clarification or add additional context in comments.

1 Comment

can you share your explanation please so i understand it better for my future reference

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.