I need to convert spark dataframe (large datasets) into pandas dataframe.
Code : spark_df = Example_df.toPandas()
I am getting this error:
/databricks/spark/python/pyspark/sql/pandas/conversion.py:145: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached the error below and can not continue. Note that 'spark.sql.execution.arrow.pyspark.fallback.enabled' does not have an effect on failures in the middle of computation.
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 30 tasks (31.0 GiB) is bigger than local result size limit 30.0 GiB, to address it, set spark.driver.maxResultSize bigger than your dataset result size.