I've got a dataframe that basically has a "key" column and a column that is a JSON string.
| key | jsonString |
|---|---|
| 111 | {"id" : "12345", "foo" : "stuff"} |
| 111 | {"id" : "23456", "bar" : "other stuff"} |
| 111 | {"id" : "34567", "baz" : "even other stuff"} |
For each "key" value, I want to combine all the JSON strings into a JSON array, with a couple of other elements (with the ultimate goal of publishing to a Kafka topic). So I would end up with this:
{
"type" : "combined",
"values" :
[
{"id" : "12345", "foo" : "stuff"},
{"id" : "23456", "bar" : "other stuff"},
{"id" : "34567", "baz" : "even other stuff"}
]
}
I tried concatenating it all into a big string, but that went poorly. Is there a way to do this that doesn't involve having one giant schema for all my JSON strings? The real JSON is obviously much more complicated, and there are 4 possible schemas each one could be.