0

I'm new in scala and spark and don't know how to explode "path" field and found max and min "event_dttm" field for one pass. I have a data:

val weblog=sc.parallelize(Seq(
  ("39f0412b4c91","staticnavi.com", Seq( "panel", "cm.html" ), 1424954530, "SO.01"),
  ("39f0412b4c91","staticnavi.com", Seq( "panel", "cm.html" ), 1424964830, "SO.01"),
  ("39f0412b4c91","staticnavi.com", Seq( "panel", "cm.html" ), 1424978445, "SO.01"),
   )).toDF("id","domain","path","event_dttm","load_src")

I must to get next result:

"id"        |   "domain"   |"newPath" | "max_time" | min_time   | "load_src"
39f0412b4c91|staticnavi.com|  panel   | 1424978445 | 1424954530 | SO.01
39f0412b4c91|staticnavi.com|  cm.html | 1424978445 | 1424954530 | SO.01

I think it's possible realize via row function, but don't know how.

1 Answer 1

1

You are looking for explode(), followed by a groupBy aggregation:

import org.apache.spark.sql.functions.{explode, min, max}

var result = weblog.withColumn("path", explode($"path"))
  .groupBy("id","domain","path","load_src")
  .agg(min($"event_dttm").as("min_time"),
       max($"event_dttm").as("max_time"))

result.show()
+------------+--------------+-------+--------+----------+----------+
|          id|        domain|   path|load_src|  min_time|  max_time|
+------------+--------------+-------+--------+----------+----------+
|39f0412b4c91|staticnavi.com|  panel|   SO.01|1424954530|1424978445|
|39f0412b4c91|staticnavi.com|cm.html|   SO.01|1424954530|1424978445|
+------------+--------------+-------+--------+----------+----------+
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! Works fine. Is there another way w/o using explode?
using the rdd api, but that's going to be more elaborate and potentially slower.
I've found solution with flatMap: val result = weblog.flatMap { case Row(id: String, domain: String, path: String, event_dttm: Long, load_src: String, ymd: String) => { path.split("/").map(x => (id, domain.concat("#").concat(x), BigInt(event_dttm), load_src, ymd)) }}

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.