I want to reformat json structure using spark process, into a structure containing array of objects. My input file contain the lines:
{ "keyvals" : [[1,"a"], [2, "b"]] },
{ "keyvals" : [[3,"c"], [4, "d"]] }
and I want my process to output
{ "keyvals": [{"id": 1, "value": "a"}, {"id": 2, "value": "c"}] },
{ "keyvals": [{"id": 3, "value": "c"}, {"id": 4, "value": "d"}] }
What's the best way to do that?
For looking at the example input you can run within scala spark-shell:
var jsonStrings = Seq("""{"keyvals": [[1,"a"], [2, "b"]] }""", """{ "keyvals" : [[3,"c"], [4, "d"]] }""")
var inputRDD = sc.parallelize(jsonStrings)
var df = spark.sqlContext.read.json(inputRDD)
// reformat goes here ?
df.write.json("myfile.json")
thanks
to_jsonmaybe? Please produce a minimal reproducible example.