Using Spark 2.3.2.
I am trying to use the values of some columns of a DataFrame and put them into an existing JSON structure. Assuming I have this DataFrame:
val testDF = Seq(("""{"foo": "bar", "meta":{"app1":{"p":"2", "o":"100"}, "app2":{"p":"5", "o":"200"}}}""", "10", "1337")).toDF("key", "p", "o")
// used as key for nested json structure
val app = "appX"
Basically, I would like to get from this column
{
"foo": "bar",
"meta": {
"app1": {
"p": "2",
"o": "100"
},
"app2": {
"p": "5",
"o": "200"
}
}
}
to this:
{
"meta": {
"app1": {
"p": "2",
"o": "100"
},
"app2": {
"p": "5",
"o": "200"
},
"appX": {
"p": "10",
"o": "1337"
}
}
}
based on the columns p and o of the DataFrame.
I have tried:
def process(inputDF: DataFrame, appName: String): DataFrame = {
val res = inputDF
.withColumn(appName, to_json(expr("(p, o)")))
.withColumn("meta", struct(get_json_object('key, "$.meta")))
.selectExpr(s"""struct(meta.*, ${appName} as ${appName}) as myStruct""")
.select(to_json('myStruct).as("newMeta"))
res.show(false)
res
}
val resultDF = process(testDF, app)
val resultString = resultDF.select("newMeta").collectAsList().get(0).getString(0)
StringContext.treatEscapes(resultString) must be ("""{"meta":{"app1":{"p":"2","o":"100"},"app2":{"p":"5","o":"200"},"appX":{"p":"10","o":"1337"}}}""")
But this assertion is not matching as I can't
- get the content of
appXinto the same level of the other two apps - do not know how to properly handle quotation marks, and
- do not know how to rename "col1" into "meta".
The test fails with:
Expected :"{"[meta":{"app1":{"p":"2","o":"100"},"app2":{"p":"5","o":"200"},"appX":{"p":"10","o":"1337"}}]}"
Actual :"{"[col1":"{"app1":{"p":"2","o":"100"},"app2":{"p":"5","o":"200"}}","appX":"{"p":"10","o":"1337"}"]}"