I have some JSON that is in the following format:
{"items": ["234", "454", "434", "534"], "time": "1574290618029", "id": "A1", "user": "Bob"}
{"items": ["432", "123", "765"], "time": "1574200618021", "id": "B1", "user": "Tim"}
{"items": ["437"], "time": "1274600618121", "id": "B1", "user": "Joe"}
Each JSON is brought into a dataframe using
spark.read.json(path)
and looping through to union them all into a single dataframe.
df.show()
shows something like this:
|items| id| time| user|
|["234", "454", "434", "534"] | "1574290618029" | "A1" | "Bob"|
|["432", "123", "765"] | "1574200618021" | "B1" | "Tim"|
|["437"] | "1274600618121" | "B1" | "Joe"|
Doing a df.explode(df.id, df.time, df.user, explode(df.items)).show() results in something very close, but not quite what I'm looking for.
|id| time| user| col|
|A1| 1574290618029| Bob| 234|
|A1| 1574290618029| Bob| 454|
|A1| 1574290618029| Bob| 434|
|A1| 1574290618029| Bob| 534|
|B1| 1574200618021| Tim| 432|
|B1| 1574200618021| Tim| 123|
|B1| 1574200618021| Tim| 765|
|B1| 1274600618121| Joe| 437|
What I am actually needing is the data to be in a format like this:
|id| time| user| item_num| col|
|A1| 1574290618029| Bob| item1| 234|
|A1| 1574290618029| Bob| item2| 454|
|A1| 1574290618029| Bob| item3| 434|
|A1| 1574290618029| Bob| item4| 534|
|B1| 1574200618021| Tim| item1| 432|
|B1| 1574200618021| Tim| item2| 123|
|B1| 1574200618021| Tim| item3| 765|
|B1| 1574200618021| Tim| item4| NA|
|B1| 1274600618121| Joe| item1| 437|
|B1| 1274600618121| Joe| item2| NA|
|B1| 1274600618121| Joe| item3| NA|
|B1| 1274600618121| Joe| item4| NA|
Is there a simple way to accomplish this utilizing explode that I'm not seeing? I'm also very new to spark coding so please excuse me if there is some very obvious answer...