1

i have a json file with the following schema:

root
 |-- demo: boolean (nullable = true)
 |-- person: struct (nullable = true)
 |    |-- dateOfBirth: string (nullable = true)
 |    |-- email: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- emergencyContacts: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- name: string (nullable = true)
 |    |    |    |-- phone: string (nullable = true)
 |    |    |    |-- relationship: string (nullable = true)
 |    |-- id: long (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- phones: struct (nullable = true)
 |    |    |-- home: string (nullable = true)
 |    |    |-- mobile: string (nullable = true)
 |    |-- registered: boolean (nullable = true)
 |-- product: string (nullable = true)
 |-- releaseDate: string (nullable = true)

i want to parse the emergencyContacts array so as to get the names of the contacts

i have reached till the persons struct using:

val df =sqlContext.read.json("file:///home/training211/test/cjson1.json").toDF();
df.registerTempTable("df");
df.printSchema();
val person = df.select("person");
person.registerTempTable("person");
person.printSchema();
person.show();

if i want to go further it always gives an error as : org.apache.spark.sql.AnalysisException: cannot resolve 'persons.emergencyContact s' given input columns: [person];

also tried doing:

val arrayFlatten = df.select($"person.emergencyContacts".getItem(0)) 

which gives me

+---------------------------+
|person.emergencyContacts[0]|
+---------------------------+
|       [Jane Doe,888-555...|
+---------------------------+

but this is not the result i want

Any help is appreciated

2
  • when you try df.select($"person.emergencyContacts"), what you got? can you update your question? Commented Nov 4, 2016 at 10:38
  • done! Any help will be appreciated :) Commented Nov 7, 2016 at 4:32

1 Answer 1

1

Can you try the below.

df.select($"person.emergencyContacts").show

If you want to get the phone, you can do something like this.

df.select($"person.emergencyContacts.phone").show

Or you can iterate the emegencyContacts array to get the phone and name details. Look for Scala array iteration.

Sign up to request clarification or add additional context in comments.

8 Comments

org.apache.spark.sql.AnalysisException: Can only star expand struct data types. Attribute: ArrayBuffer(person, emergencyContacts); i'm getting this error with the solution provided
@KaranKaushal: Remove the * at the end, updated the answer.
@KaranKaushal: Also, look into this blog.antlypls.com/blog/2016/01/30/…
your updated answer gives me the whole emergencyContacts array .is it possible to further flatten and get the details like names and phone etc seperately?
@KaranKaushal: Updated the answer.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.