1

I have some data in dataframe which looks like this:

+-----------+--------+-----------+--------------------------------+
|       Noun| Pronoun|  Adjective|Metadata                        |
+-----------+--------+-----------+--------------------------------+
|      Homer| Simpson|Engineer   |{"Age": "50", "Country": "USA"} | 
|      Elon | Musk   |King       |{"Age": "45", "Country": "RSA"} |
|      Bart | Lee    |Cricketer  |{"Age": "35", "Country": "AUS"} |
|      Lisa | Jobs   |Daughter   |{"Age": "35", "Country": "IND"} |
|      Joe  | Root   |Player     |{"Age": "31", "Country": "ENG"} |
+-----------+--------+-----------+--------------------------------+

I want to append values from another column(say Adjective) into the Metadata column. So that the final dataframe would look like this:

+-----------+--------+-----------+------------------------------------------------------------+
|       Noun| Pronoun|  Adjective|Metadata                                                    |
+-----------+--------+-----------+------------------------------------------------------------+
|      Homer| Simpson|Engineer   |{"Age": "50", "Country": "USA", "Adjective": "Engineer"}    | 
|      Elon | Musk   |King       |{"Age": "45", "Country": "RSA", "Adjective": "King"}        |
|      Bart | Lee    |Cricketer  |{"Age": "35", "Country": "AUS", "Adjective": "Cricketer"}   |
|      Lisa | Jobs   |Daughter   |{"Age": "35", "Country": "IND", "Adjective": "Daughter"}    |
|      Joe  | Root   |Player     |{"Age": "31", "Country": "ENG", "Adjective": "Player"}      |
+-----------+--------+-----------+------------------------------------------------------------+

Please suggest how this can be implemented.

3
  • What is the data type of column Metadata? Commented Jan 3, 2022 at 9:22
  • it is a json object. Commented Jan 3, 2022 at 9:24
  • yes, String in dataframe. Commented Jan 3, 2022 at 9:35

1 Answer 1

3

Assuming your column Metadata contains JSON strings, you can first convert it to MapType with from_json function, then add the columns you want using map_concat and finally convert again to JSON string using to_json:

val df2 = df.withColumn(
    "Metadata",
    from_json(col("Metadata"), lit("map<string,string>"))
).withColumn(
    "Metadata",
    to_json(map_concat(col("Metadata"), map(lit("Adjective"), col("Adjective"))))
)

df2.show(false)

//+-----+-------+---------+----------------------------------------------------+
//|Noun |Pronoun|Adjective|Metadata                                            |
//+-----+-------+---------+----------------------------------------------------+
//|Homer|Simpson|Engineer |{"Age":"50","Country":"USA","Adjective":"Engineer"} |
//|Elon |Musk   |King     |{"Age":"45","Country":"RSA","Adjective":"King"}     |
//|Bart |Lee    |Cricketer|{"Age":"35","Country":"AUS","Adjective":"Cricketer"}|
//|Lisa |Jobs   |Daughter |{"Age":"35","Country":"IND","Adjective":"Daughter"} |
//|Joe  |Root   |Player   |{"Age":"31","Country":"ENG","Adjective":"Player"}   |
//+-----+-------+---------+----------------------------------------------------+

This also can be done using a conversion to StructType instead of MapType but map is more generic in this case.

Sign up to request clarification or add additional context in comments.

2 Comments

This answer worked but what if i want to add another json object within the Metadata column. so that the final value in Metadata will look like {"Age":"50","Country":"USA","Adjective":"Engineer", "Tags": {"t1": "v1", "t2": "v2" }}
@UserASR hey were you able to achieve this, I have very similar requirement. Could you please help me

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.