9

now has JSON data as follows

{"Id":11,"data":[{"package":"com.browser1","activetime":60000},{"package":"com.browser6","activetime":1205000},{"package":"com.browser7","activetime":1205000}]}
{"Id":12,"data":[{"package":"com.browser1","activetime":60000},{"package":"com.browser6","activetime":1205000}]} 
......

This JSON is the activation time of app, the purpose of which is to analyze the total activation time of each app

I use sparK SQL to parse JSON

scala

val sqlContext = sc.sqlContext
val behavior = sqlContext.read.json("behavior-json.log")
behavior.cache()
behavior.createOrReplaceTempView("behavior")
val appActiveTime = sqlContext.sql ("SELECT data FROM behavior") // SQL query
appActiveTime.show (100100) // print dataFrame
appActiveTime.rdd.foreach(println) // print RDD

But the printed dataFrame is like this

.

+----------------------------------------------------------------------+

| data|

+----------------------------------------------------------------------+

| [[60000, com.browser1], [12870000, com.browser]]|

| [[60000, com.browser1], [120000, com.browser]]|

| [[60000, com.browser1], [120000, com.browser]]|

| [[60000, com.browser1], [1207000, com.browser]]|

| [[120000, com.browser]]|

| [[60000, com.browser1], [1204000, com.browser5]]|

| [[60000, com.browser1], [12075000, com.browser]]|

| [[60000, com.browser1], [120000, com.browser]]|

| [[60000, com.browser1], [1204000, com.browser]]|

| [[60000, com.browser1], [120000, com.browser]]|

| [[60000, com.browser1], [1201000, com.browser]]|

| [[1200400, com.browser5]]|

| [[60000, com.browser1], [1200400, com.browser]]|

|[[60000, com.browser1], [1205000, com.browser6], [1205000, com.browser7]]|

.

RDD is like this

.

[WrappedArray ([60000, com.browser1], [60000, com.browser1])]

[WrappedArray ([120000, com.browser])]

[WrappedArray ([60000, com.browser1], [1204000, com.browser5])]

[WrappedArray ([12075000, com.browser], [12075000, com.browser])]

.

And I want to turn the data into

.

Com.browser1 60000

Com.browser1 60000

Com.browser 12075000

Com.browser 12075000

...

.

I want to turn the array elements of each line in RDD into one row. Of course, it can be another structure that is easy to analyze.

Because I only learn spark and Scala a lot, I have try it for a long time but fail, so I hope you can guide me.

3 Answers 3

19

From your given json data you can view the schema of your dataframe with printSchema and use it

appActiveTime.printSchema()
root
 |-- data: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- activetime: long (nullable = true)
 |    |    |-- package: string (nullable = true)

Since you have array you need to explode the data and select the struct field as below

import org.apache.spark.sql.functions._
appActiveTime.withColumn("data", explode($"data"))
       .select("data.*")
       .show(false)

Output:

+----------+------------+
|activetime|     package|
+----------+------------+
|     60000|com.browser1|
|   1205000|com.browser6|
|   1205000|com.browser7|
|     60000|com.browser1|
|   1205000|com.browser6|
+----------+------------+

Hope this helps!

Sign up to request clarification or add additional context in comments.

2 Comments

thank you very munch ! I got it. I will learn more about spark sql.
@koiralo this explode seems to take so much of time on large data.. any other way to do that
0

with @Shankar Koirala 's help , I learned how to use ' explode' to handle joson array.

  val df = sqlContext.sql("SELECT data FROM behavior")
appActiveTime.select(explode(df("data"))).toDF("data")
  .select("data.package","data.activetime")
  .show(false)

Comments

0

For Apache spark Java we will require to do something like below:

Dataset<Row> dataDF = spark.read()
                .option("header", "true")
                .json("/file_path");
dataDF.createOrReplaceTempView("behavior");
String sqlQuery = "SELECT data from behavior";
Dataset<Row> jsonData = spark.sql(sqlQuery);
snapshot.withColumn("data", explode(jsonData.col("data"))).select("data.*").show();

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.