3

I'm trying to read a Json file which is like :

[ 
{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"31","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"5","up":null,"Crate":"2"} 
,{"MLrate":"34","Nrout":"0","up":null,"Crate":"4"} 
,{"MLrate":"33","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"8","up":null,"Crate":"2"} 
]} 
,{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"0"} 
,{"MLrate":"35","Nrout":"1","up":null,"Crate":"5"} 
,{"MLrate":"30","Nrout":"6","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"38","Nrout":"8","up":null,"Crate":"1"} 
]} 
,...
] 

I've tried the command:

    val df = sqlContext.read.json("namefile") 
    df.show() 

But this does not work : my columns are not recognized...

2
  • is your single json data is spread into multiple lines or single line ? Commented Jul 29, 2015 at 13:55
  • single line but when I open it with notePad this is easy to read and on several lines. Commented Jul 29, 2015 at 14:01

2 Answers 2

6

If you want to use read.json you need a single JSON document per line. If your file contains a valid JSON array with documents it simply won't work as expected. For example if we take your example data input file should be formatted like this:

{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"31","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"30","Nrout":"5","up":null,"Crate":"2"} ,{"MLrate":"34","Nrout":"0","up":null,"Crate":"4"} ,{"MLrate":"33","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"8","up":null,"Crate":"2"} ]}
{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"0"} ,{"MLrate":"35","Nrout":"1","up":null,"Crate":"5"} ,{"MLrate":"30","Nrout":"6","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"38","Nrout":"8","up":null,"Crate":"1"} ]}

If you use read.json on above structure you'll see it is parsed as expected:

scala> sqlContext.read.json("namefile").printSchema
root
 |-- COL: long (nullable = true)
 |-- DATA: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Crate: string (nullable = true)
 |    |    |-- MLrate: string (nullable = true)
 |    |    |-- Nrout: string (nullable = true)
 |    |    |-- up: string (nullable = true)
 |-- IFAM: string (nullable = true)
 |-- KTM: long (nullable = true)
Sign up to request clarification or add additional context in comments.

1 Comment

@zero323 I want o do something similar, but I just want to load the data from the json file, to an rdd and I dont want to go through the spark-sql just a simple transformation. How can I achieve this? Thanks!
0

If you don't want to format your JSON file (line by line) you could create a schema using StructType and MapType using SparkSQL functions

import org.apache.spark.sql.DataFrame 
import org.apache.spark.sql.functions._ 
import org.apache.spark.sql.types._

// Convenience function for turning JSON strings into DataFrames
def jsonToDataFrame(json: String, schema: StructType = null): 
DataFrame = {
    val reader = spark.read
    Option(schema).foreach(reader.schema)   
    reader.json(sc.parallelize(Array(json))) 
}

// Using a struct
val schema = new StructType().add("a", new StructType().add("b", IntegerType))

// call the function passing the sample JSON data and the schema as parameter
val json_df = jsonToDataFrame("""
   {
     "a": {
        "b": 1
     }
   } """, schema)

// now you can access your json fields
val b_value = json_df.select("a.b")
b_value.show()

See this reference documentation for more examples and details https://docs.databricks.com/spark/latest/spark-sql/complex-types.html#transform-complex-data-types-scala

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.