0

I have a pyspark dataframe in which one of the column is in below format:

[{key1: value1},{key2:value2}, {key3:value3}, {key4:value4}]

Lets say it as ColumnY as below:

ColumnY
[{key1: value1},{key2:value2}, {key3:value3}, {key4:value4}]

I would like to convert it into columns of the dataframe where column name is keyX and its contents are valueX where X=[1,4] as below:

key 1 key 2 key 3 key 4
value1 value2 value3 value4

I have tried some solutions but they didn't work. Request you to share any ideas or solutions if you have. Thank you in advance.

1 Answer 1

1

That is a very badly formatted JSON without any quotes, but you can still parse it by brute force:

import pyspark.sql.functions as F

df2 = df.selectExpr("""
    explode(
        transform(
            split(ColumnY, ','), 
            x -> str_to_map(regexp_replace(x, '[\\\\[\\\\{ \\\\]\\\\}]', ''), ' ', ':')
        )
    ) as col
""").select(F.explode('col')).groupBy().pivot('key').agg(F.first('value'))

df2.show()
+------+------+------+------+
|  key1|  key2|  key3|  key4|
+------+------+------+------+
|value1|value2|value3|value4|
+------+------+------+------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.