Getting error mapPartitions and broadcast error on mapPartition() line

-1

I am trying to run objects_small_list parallel using mapPartitions and comparing in with the obj_id with process_object_id which is in the objects_small_list and pulling matching folder rows in function process_partition but i am getting this error:

PicklingError: Could not serialize object: RuntimeError: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

any help would be appreciated. I am running this in azure synapse.

folder_data =[{"folderid": 10, "folder_path": "path", "obj_id": 1, "status":   
              "Progress"},
              {"folderid": 11, "folder_path": "path", "obj_id": 2, "status":    
              "Progress"},
              {"folderid": 12, "folder_path": "path", "obj_id": 3, "status":  
              "Progress"},
             {"folderid": 13, "folder_path": "path", "obj_id": 4, "status": "Progress"}
             ]
folders_df = spark.createDataFrame(folder_data)

all_folders = folders_df.collect()

folders_broadcast =sc.broadcast(all_folders)

def process_partition(row):

# rows = list(row)

results = []
for row in folders_df.collect(): 
    ProcessObjectId = row["process_object_id"]
    matching_folders = [
        f for f in folders_broadcast.value if f['obj_id'] == ProcessObjectId
    ]
        

    # folders_to_process_rows = folders_df.collect()
    results.extend(matching_folders)
yield results

rdd = sc.parallelize(objects_small_list, numSlices=8)
rdd_result = rdd.mapPartitions(process_partition).collect()
all_folders_filtered = [item for partition in rdd_result for item in partition]

Thanks

asked Oct 17 at 8:29

SAH

112 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Getting error mapPartitions and broadcast error on mapPartition() line

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest