Pyspark udf function error in lambda function

Question

I have written a udf function below and it throws me an error. Please help.

Below is my dataset;

df1 = sqlContext.range(0, 1000)\
 .withColumn('normal1',func.abs(10*func.round(randn(seed=1),2)))\
 .withColumn('normal2',func.abs(100*func.round(randn(seed=2),2)))\
 .withColumn('normal3',func.abs(func.round(randn(seed=3),2)))

df1 = df1.withColumn('Y',when(df1.normal1*df1.normal2*df1.normal3>750, 1)\
       .otherwise(0))

udf function below:

from pyspark.sql import types as T
balancingRatio=0.8
calculateWeights = udf(lambda d:(1 * balancingRatio) if d==0 else (1 * (1.0 -   balancingRatio)),T.IntegerType())
weightedDataset = df1.withColumn('classWeightCol', calculateWeights('Y'))
weightedDataset.show()

It takes some time and throw me an error;

Py4JJavaError: An error occurred while calling o670.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 25.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
25.0 (TID 427, localhost, executor driver): org.apache.spark.SparkException: 
Python worker failed to connect back.

What might be the problem? Thank you.

A simple example on internet that I found is not working also

maturity_udf = udf(lambda age: "adult" if age >=18 else "child", 
 T.StringType())
df = sqlContext.createDataFrame([{'name': 'Alice', 'age': 1}])
df.withColumn("maturity", maturity_udf(df.age)).show()

Not: I got python 3.7.1 and spark 2.4

What is T.IntegerType() exactly? shouldn't be just IntegerType()? — Ali AzG
– Ali AzG, Commented Nov 24, 2018 at 12:52
It seems some version issues. try installing version 2.3 and try again. — Ali AzG
– Ali AzG, Commented Nov 24, 2018 at 13:03

Smit Shah · Accepted Answer · 2019-01-11 20:54:01Z

2

You need to disable fork safety by setting the OBJC_DISABLE_INITIALIZE_FORK_SAFETY variable to YES This solved the issue for me.

import os
os.environ['OBJC_DISABLE_INITIALIZE_FORK_SAFETY'] = 'YES'

answered Jan 11, 2019 at 20:54

Smit Shah

363 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pyspark udf function error in lambda function

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related