I have a spark data frame as follows
----------------------------------------------------------------------------
| item_id | popular_tags | popularity_score
____________________________________________________________________________
| id_1 Samsung 0.4
| id_1 long battery 0.8
| id_2 Apple 0.9
| id_2 UI 0.9
_____________________________________________________________________________
I want to group this data frame by item_id and output as a file with each line being a json object
{id_1: {"Samsung":{"popularity_score":0.4}, "long_battery":{"popularity_score": 0.8}}}
{id_2: {"Apple": {"popularity_score": 0.9},"UI":{"popularity_score":0.9}}}
I tried using to_json and collect_list functions but I get a list not a nested json object.
This is a big distributed dataframe, so converting to pandas or collecting it into a single machine is not an option.