1

Here is an example of what I'm trying to accomplish:

Case:

column1 column2 json_column
One Two {'A': '1', 'B': '2', 'C': '3'}

Desired output:

column1 column2 json_column B
One Two {'A': '1', 'B': '2', 'C': '3'} 2

As seen here, json_column has been parsed and a new column 'B' has been created containing the value of key 'B' in the json_column.

0

1 Answer 1

2

If the column is of string type, you could use from_json:

F.from_json('json_column', 'struct<A:string,B:string,C:string>')['B']

Full example:

from pyspark.sql import functions as F
df = spark.createDataFrame([('One', 'Two', "{'A': '1', 'B': '2', 'C': '3'}")], ['column1', 'column2', 'json_column'])

df = df.withColumn('B', F.from_json('json_column', 'struct<A:string,B:string,C:string>')['B'])

df.show(truncate=0)
# +-------+-------+------------------------------+---+
# |column1|column2|json_column                   |B  |
# +-------+-------+------------------------------+---+
# |One    |Two    |{'A': '1', 'B': '2', 'C': '3'}|2  |
# +-------+-------+------------------------------+---+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.