Out of memory for a smaller dataset

I have a pyspark job reading the input data volume of just ~50-55GB Parquet data from a delta table. Job is using n2-highmem-4 GCP VM and 1-15 worker with autoscaling. Each workerVM of type n2-highmem-4 has 32GB memory and 4 cores. Each VM has one executor. 22GB is allocated per executor. ie 22*15=330GB overall executor memory, which seems to be large enough for ~55GB input data. shuffle partition is set to 200. But Im getting OOM error.

Shuffle partition count = 200 for 15 workers. That’s roughly ~13 partitions (200/15) per executor. Given each executor(or VM) has 4 core, 4 tasks run in parallel.

Only single stage has failed and task details from failed job is as below.

Could you please help understand why this is not sufficient leading to oom? Also is it necessary for all ~13 partitions assigned to an executor to fit in memory at once or since only 4 tasks run in parallel per executor, is it sufficient for memory to accommodate just 4 partitions at a time?

edited May 5 at 16:12

asked May 5 at 16:07

user16798185

3774 silver badges15 bronze badges

please provide the query and code sample that is causing the issue and where the oome is happening, without that it's just guesswork. e.g. you are using .collect

Chris
– Chris

2025-05-05 16:23:41 +00:00
Commented May 5 at 16:23
The codebase is large, so I'm unable to share it here, sorry. However, I'm not using collect() or any other known anti-patterns. Primarily looking to understand the potential reasons for the OOM issue based on the data size and cluster configs shared above for general insight and troubleshooting.

user16798185
– user16798185

2025-05-05 16:26:46 +00:00
Commented May 5 at 16:26

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Out of memory for a smaller dataset

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest