I have a dataframe with 2 string columns, and another one with an array strucuture:
-- music: string (nullable = true)
|-- artist: string (nullable = true)
|-- details: array (nullable = false)
| |-- element: struct (containsNull = true)
| | |-- Genre: string (nullable = true)
| | |-- Origin: string (nullable = true)
Just to help you, this is a sample data:
music | artist | details
Music_1 | Artist_1 | [{"Genre": "Rock", "Origin": "USA"}]
Music_2 | Artist_3 | [{"Genre": "", "Origin": "USA"}]
Music_3 | Artist_1 | [{"Genre": "Rock", "Origin": "UK"}]
I am trying a simple operation, I guess, just concat the Key and Value by '-'. Basically, what I am trying to do is to get the following strucuture:
music | artist | details
Music_1 | Artist_1 | Genre - Rock, Origin - USA
Music_2 | Artist_3 | Genre - , Origin - USA
Music_3 | Artist_1 | Genre - Rock, Origin - UK
For that I already tried an approach that was sparate first the key and value in different columns to then I can concat the items:
display(df.select(col("music"), col("artist"), posexplode("details").alias("key","value")))
But I got the following result:
music | artist | key | value
Music_1 | Artist_1 | 0 | [{"Genre": "Rock", "Origin": "USA"}]
Music_2 | Artist_3 | 0 | [{"Genre": "", "Origin": "USA"}]
Music_3 | Artist_1 | 0 | [{"Genre": "Rock", "Origin": "UK"}]
Probably is not the best solution, anyone can help me?
Thanks!