0

I am new to spark and Scala and I am trying to learn spark for one of my learning project. I have a JSON file which look like this:

[
  {
"year": 2012,
"month": 8,
"title": "Batman"
},
  {
"year": 2012,
"month": 8,
"title": "Hero"
 },
 {
"year": 2012,
"month": 7,
"title": "Robot"
 }
]

I started reading this json to spark DataFrame file so i tried following:

spark.read
  .option("multiline", true)
  .option("mode", "PERMISSIVE")
  .option("inferSchema", true)
  .json(filePath)

It reads the JSON but convert the data to spark columns. My requirement is to read each data object as one individual column.

I want to read it to a spark DataFrame where I expect output like following:

+----------------------------------------+
|json                                    |
+----------------------------------------+
|{"year":2012,"month":8,"title":"Batman"}|
|{"year":2012,"month":8,"title":"Hero"}  |
|{"year":2012,"month":7,"title":"Robot"} |
|{"year":2011,"month":7,"title":"Git"}   |
+----------------------------------------+

1 Answer 1

1

Use toJSON

val df = spark.read
  .option("multiline", true)
  .option("mode", "PERMISSIVE")
  .option("inferSchema", true)
  .json(filePath).toJSON

Now

df.show(false)

+----------------------------------------+
|value                                   |
+----------------------------------------+
|{"month":8,"title":"Batman","year":2012}|
|{"month":8,"title":"Hero","year":2012}  |
|{"month":7,"title":"Robot","year":2012} |
+----------------------------------------+
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the quick response. My purpose is not to print it but read in such a way that each JSON object becomes a row. I guess reading first and then converting to JSON will have unnecessary cost. is there any better way to achieve this ?
Kishore's Answer gives you each Json Object in a separate row. Or when say each object in a separate row dd you mean below output? |month| title|year| | 8|Batman|2012|
I want the way Kishore has mentioned but if you look at the code it is first reading (.json) and converting it to JSON (.toJSON) which I think is a costly way of doing it. So I want to know if this is the most efficient way or not.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.