1

I have a function, which trys to pass a broadcast variable into UDF.

The function looks like:

def generate_lookup_code(self, lookup_map):

    lookup_map_broadcast = spark_session.sparkContext.broadcast(lookup_map)
    print("lookup_map has been broadcasted")

    #### UDF function only return a constant string###
    def _generate_code(bc_reasoncode_lookup_map):

        reasoncode_lookup_map = bc_reasoncode_lookup_map.value
        return "hello"


    udfGenerateCode = F.udf(_generate_code, StringType())

    input_df = input_df.withColumn('code', udfGenerateCode(lookup_map_broadcast))

    input_df.show()

My intention is only trying to pass the broadcast variable to the UDF, however, I got the error:

'Broadcast' object has no attribute '_get_object_id'

I have no idea where is wrong?

1 Answer 1

4

You don't need to pass a broadcasted variable as a UDF argument, just reference it from the function:

lookup_map_broadcast = spark_session.sparkContext.broadcast(lookup_map)

def _generate_code():
    reasoncode_lookup_map = lookup_map_broadcast.value
    return "hello"

udfGenerateCode = F.udf(_generate_code, StringType())
input_df = input_df.withColumn('code', udfGenerateCode())

A UDF is called for each row and it can accept either a column or literal.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.