0
@udf(returnType=MapType(StringType(), FloatType()))
def postprocess(data):
    ret = dict()
    ....
    # insert key and values to dictionary from data
    ...

    return ret

ret = postprocess(col('data'))
print(ret) # Column<'postprocess(data)'>

I would like to create multiple columns from dictionary column.

If ret has {"key1": 0.1, "key2": 0.3}, the result should be

| key1 | key2 |

| 0.1 | 0.3 |

How can I create it?

1 Answer 1

1

To achieve your goal, you can use .explode() to create multiple columns from a dictionary column. Details: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.explode.html

However, in the performance perspective, not sure how complicated your UDF is, I think you should use the spark sql function to create the columns instead of using the Python UDF function if it's possible. You can check this post: https://stackoverflow.com/a/38297050/10445333

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.