24

I have a schema as shown below. How can I parse the nested objects?

root
 |-- apps: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- appName: string (nullable = true)
 |    |    |-- appPackage: string (nullable = true)
 |    |    |-- Ratings: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- date: string (nullable = true)
 |    |    |    |    |-- rating: long (nullable = true)
 |-- id: string (nullable = true)
2
  • 5
    what have your tried so far? Commented Apr 29, 2015 at 15:59
  • I was trying to treat each json object as a String and parse it using JSONDecoder parser. Commented Apr 29, 2015 at 16:36

5 Answers 5

28

Assuming you read in a json file and print the schema you are showing us like this:

DataFrame df = sqlContext.read().json("/path/to/file").toDF();
    df.registerTempTable("df");
    df.printSchema();

Then you can select nested objects inside a struct type like so...

DataFrame app = df.select("app");
        app.registerTempTable("app");
        app.printSchema();
        app.show();
DataFrame appName = app.select("element.appName");
        appName.registerTempTable("appName");
        appName.printSchema();
        appName.show();
Sign up to request clarification or add additional context in comments.

3 Comments

just to add, above code does not need registerTempTable to work. You need to registerTempTable only when you need to execute spark sql query. Also registerTempTable had been deprecated since Spark 2.0 and had been replaced by createOrReplaceTempView
This is assuming that you know the schema. What if you are not sure about the schema of the nested object? How do you even create the schema of the nested object at all? I kinda asked this question in here too: stackoverflow.com/questions/43438774/…
I am having the same problem, and this code does not work for me. When I try to select("app.element.appName") (or the analagous fields for my case, I get the error org.apache.spark.sql.AnalysisException: No such struct field element in.... The element field is not present in the original json but is created to represent a jsonarray. but for some reason it isn't finding it
5

Try this:

val nameAndAddress = sqlContext.sql("""
    SELECT name, address.city, address.state
    FROM people
""")
nameAndAddress.collect.foreach(println)

Source: https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html

Comments

3

I am using PySpark, but the logic should be similar. I found this way of parsing my nested JSON useful:

df.select(df.apps.appName.alias("apps_Name"), \
          df.apps.appPackage.alias("apps_Package"), \
          df.apps.Ratings.date.alias("apps_Ratings_date")) \
   .show()

The code could be obviously shortened with a f-string.

Comments

3

Have you tried doing it straight from the SQL query like

Select apps.element.Ratings from yourTableName

This will probably return an array and you can more easily access the elements inside. Also, I use this online JSON viewer when I have to deal with large JSON structures and the schema is too complex.

Comments

1
var df = spark.read.format("json").load("/path/to/file")
df.createOrReplaceTempView("df");
spark.sql("select apps.element.Ratings from df where apps.element.appName like '%app_name%' ").show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.