I'm new in scala and spark and don't know how to explode "path" field and found max and min "event_dttm" field for one pass. I have a data:
val weblog=sc.parallelize(Seq(
("39f0412b4c91","staticnavi.com", Seq( "panel", "cm.html" ), 1424954530, "SO.01"),
("39f0412b4c91","staticnavi.com", Seq( "panel", "cm.html" ), 1424964830, "SO.01"),
("39f0412b4c91","staticnavi.com", Seq( "panel", "cm.html" ), 1424978445, "SO.01"),
)).toDF("id","domain","path","event_dttm","load_src")
I must to get next result:
"id" | "domain" |"newPath" | "max_time" | min_time | "load_src"
39f0412b4c91|staticnavi.com| panel | 1424978445 | 1424954530 | SO.01
39f0412b4c91|staticnavi.com| cm.html | 1424978445 | 1424954530 | SO.01
I think it's possible realize via row function, but don't know how.