I have defined my schema for the df in a json file as follows:
{
"table1":{
"fields":[
{"metadata":{}, "name":"first_name", "type":"string", "nullable":false},
{"metadata":{}, "name":"last_name", "type":"string", "nullable":false},
{"metadata":{}, "name":"subjects", "type":"array","items":{"type":["string", "string"]}, "nullable":false},
{"metadata":{}, "name":"marks", "type":"array","items":{"type":["integer", "integer"]}, "nullable":false},
{"metadata":{}, "name":"dept", "type":"string", "nullable":false}
]
}
}
EG JSON DATA:
{
"table1": [
{
"first_name":"john",
"last_name":"doe",
"subjects":["maths","science"],
"marks":[90,67],
"dept":"abc"
},
{
"first_name":"dan",
"last_name":"steyn",
"subjects":["maths","science"],
"marks":[90,67],
"dept":"abc"
},
{
"first_name":"rose",
"last_name":"wayne",
"subjects":["maths","science"],
"marks":[90,67],
"dept":"abc"
},
{
"first_name":"nat",
"last_name":"lee",
"subjects":["maths","science"],
"marks":[90,67],
"dept":"abc"
},
{
"first_name":"jim",
"last_name":"lim",
"subjects":["maths","science"],
"marks":[90,67],
"dept":"abc"
}
]
}
I want to create the equivalent spark schema from this json file. Below is my code: (reference: Create spark dataframe schema from json schema representation)
with open(schemaFile) as s:
schema = json.load(s)["table1"]
source_schema = StructType.fromJson(schema)
The above code works fine if i dont have any array columns. But throws the below error if i have array columns in my schema.
"Could not parse datatype: array" ("Could not parse datatype: %s" json_value)
"items":{"type":["string", "string"]}. I think is better to post your actual data or just try to load the json in Spark and then export that schema that was created by Spark"items":{"type":["string", "string"]}is not a valid definition, what exactly are you trying to say here? Can you post some actual json data?