1

I have a spark dataframe (df) with columns - name, id, project, start_date, status

When used to_json function in aggregation, it makes the datatype of payload to be array<string>. How do I convert the array<string> to array<struct<project:string, start_date:date, status: string>>? This conversion is needed to access from redshift spectrum.

df_gp= df.groupBy(F.col('name'),
                          F.col('id')).agg(F.collect_list(
                          F.to_json(F.struct(('project'),
                                             ('start_date'),
                                             ('status')))).alias("payload"))


I followed steps given in, this documentation

import json
def parse_json(array_str):
    json_obj = json.loads(array_str)
    for item in json_obj: 
        yield (item["project"], item["start_date"],item["status"])

json_schema = ArrayType(StructType([StructField('project', StringType(), nullable=True)
, StructField('start_date', DateType(), nullable=True)
, StructField('status', StringType(), nullable=True)]))

udf_parse_json = F.udf(lambda str: parse_json(str), json_schema)
df_new = df_gp.select(df_gp.name, df_gp.id, udf_parse_json(df_gp.payload).alias("payload"))

#works and shows intended schema
df_new.schema

# the following fails
df_new.show(truncate = False)

It throws error:

TypeError: the JSON object must be str, bytes or bytearray, not 'generator'

How do i fix this?

1 Answer 1

2

You don't need to_json in your aggregation, it works fine without it.

df.groupBy(F.col('name'),F.col('id')).agg(F.collect_list(
                          F.struct(('project'),
                                             ('start_date'),
                                             ('status'))).alias("payload")).printSchema()


#root
# |-- name: string (nullable = true)
# |-- id: long (nullable = true)
# |-- payload: array (nullable = true)
# |    |-- element: struct (containsNull = true)
# |    |    |-- project: string (nullable = true)
# |    |    |-- start_date: date (nullable = true)
# |    |    |-- status: string (nullable = true)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.