0

I was running this code to show a dataframe[df.show()]:


import os
import sys

from pyspark.sql import *
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession

os.environ['PYSPARK_PYTHON']=sys.executable
os.environ['PYSPARK_DRIVER_PYTHON']=sys.executable

  spark=SparkSession.builder\
        .appName("Hello Spark")\
        .master("local[2]")\
        .getOrCreate()
    
def spark_practice():
    
  date_list = [("Ravi",28),
               ("David",45),
               ("Mani",27)]
    
  df=spark.createDataFrame(date_list).toDF("Name","Age")
  df.printSchema()
  df.show()

spark_practice()

However, I got the following error:

File "C:\Program Files\Hadoop\spark-3.5.1\python\lib\py4j-0.10.9.7-src.zip\py4j\protocol.py", line 326, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o46.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (Prince-PC executor driver): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)

I have tried to set the path variable PYSPARK_DRIVER_PYTHON to the latest version of Python, which is same as the one used in project, but it did not help.

1 Answer 1

2

Downgrading python from python==3.12.1 to python==3.11.8 should resolve this issue. Also, avoid importing everything from pyspark.sql, you only need :

from pyspark.sql.session import SparkSession
Sign up to request clarification or add additional context in comments.

1 Comment

Have changed the version from 3.12.1 to 3.11.8 and it worked. thank you very much

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.