Update column value of type JSON string with another JSON string value from different column

Question

I have a PySpark dataframe where columns have JSON string values like this:

col1                          col2
{"d1":"2343","v1":"3434"} {"id1":"123"}
{"d1":"2344","v1":"3435"} {"id1":"124"}

I want to update "col1" JSON string values with "col2" JSON string values to get this:

col1                                     col2
{"d1":"2343","v1":"3434","id1":"123"}    {"id1":"123"}
{"d1":"2344","v1":"3435","id1":"124"}    {"id1":"124"}

How to do this in PySpark?

ZygD · Accepted Answer · 2022-10-04 06:17:14Z

Since you're dealing with string type columns, you can remove the last } from "col1", remove the first { from "col2" and join the strings together with comma , as delimiter.

Input:

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [('{"d1":"2343","v1":"3434"}', '{"id1":"123"}'),
     ('{"d1":"2344","v1":"3435"}', '{"id1":"124"}')],
    ["col1", "col2"])

Script:

df = df.withColumn(
    "col1",
    F.concat_ws(
        ",",
        F.regexp_replace("col1", r"}$", ""),
        F.regexp_replace("col2", r"^\{", "")
    )
)

df.show(truncate=0)
# +-------------------------------------+-------------+
# |col1                                 |col2         |
# +-------------------------------------+-------------+
# |{"d1":"2343","v1":"3434","id1":"123"}|{"id1":"123"}|
# |{"d1":"2344","v1":"3435","id1":"124"}|{"id1":"124"}|
# +-------------------------------------+-------------+

Collectives™ on Stack Overflow

Update column value of type JSON string with another JSON string value from different column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related