3

I am trying to run the following code based on some tutorial I found online:

import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql import functions
from pyspark.sql import udf
df_pd = pd.DataFrame(
data={'integers': [1, 2, 3],
 'floats': [-1.0, 0.5, 2.7],
 'integer_arrays': [[1, 2], [3, 4, 5], [6, 7, 8, 9]]}
)

df = spark.createDataFrame(df_pd)
df.show()

def square(x):
    return x**2
from pyspark.sql.types import IntegerType
square_udf_int = udf(lambda z: square(z), IntegerType())

But when I run the last line I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'module' object is not callable

I am using spark 2.3.3 on Hadoop 2.7.

Thanks

6
  • How are you calling the udf function! can you tell? Commented Mar 1, 2019 at 9:00
  • @RAMSHANKERG I dont really understand what you mean, in the last line before the error as my message says I'm trying to convert my function to an UDF, that is all the code I'm running with the error square_udf_int = udf(lambda z: square(z), IntegerType()) Commented Mar 1, 2019 at 9:13
  • Shouldn't you send some value to z by calling the square_udf_int function? Commented Mar 1, 2019 at 9:15
  • @RAMSHANKERG Yes, when I call the UDF; but the code fails when declaring it Commented Mar 1, 2019 at 10:23
  • Include the full traceback that shows the exact code causing the error. Somewhere you have a () that doesn't belong. Commented Mar 1, 2019 at 14:38

1 Answer 1

10

it seems you're importing from pyspark.sql while it should be pyspark.sql.functions like...

import pyspark.sql.functions as F

     udf_fun = F.udf (lambda..., Type())
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.