I have a column with following structure in my dataframe.
+--------------------+
| data|
+--------------------+
|{"sbar":{"_id":"5...|
|{"sbar":{"_id":"5...|
|{"sbar":{"_id":"5...|
|{"sbar":{"_id":"5...|
|{"sbar":{"_id":"5...|
+--------------------+
only showing top 5 rows
The data inside column is a json string. I want to convert the column to some other type (map, struct..). How do I do this with a udf function? I have created a function like this but cant figure out what the return type should be. I tried StructType and MapType which threw error. This is my code.
import json
from pyspark.sql.types import MapType, StructType
udf_getDict = F.udf(lambda x: json.loads(x), StructType)
subset.select(udf_getDict(F.col('data'))).printSchema()