UDF in pyspark SQL Context sending data as columns

Question

I have written a udf in pyspark like below:

df1 = df.where(point_inside_polygon(latitide,longitude,polygonArr))

df1 and df are spark dataframes

The function is given below:

def point_inside_polygon(x,y,poly):


latt = float(x)
long = float(y)
if ((math.isnan(latt)) or (math.isnan(long))):
    point = sh.geometry.Point(latt, long)
    polygonArr = poly
    polygon=MultiPoint(polygonArr).convex_hull
    if polygon.contains(point):
        return True
    else:
        return False
else:
    return False

But when I tried checking the data type of latitude and longitude, its a class of column. The data type is Column

Is there a way to iterate through each tuple and use their values, instead of taking the data type column. I don't want to use a for loop because I have a huge recordset and it defeats the purpose of using SPARK.

Is there a way to accomplish to pass the column values as float, or converting them inside the function?

652bb3ca · Accepted Answer · 2016-06-01 13:26:13Z

1

Wrap it using udf:

from pyspark.sql.types import BooleanType
from pyspark.sql.functions import udf

point_inside_polygon_ = udf(point_inside_polygon, BooleanType())
df1 = df.where(point_inside_polygon(latitide,longitude,polygonArr))

answered Jun 1, 2016 at 13:26

652bb3ca

261 bronze badge

Sign up to request clarification or add additional context in comments.

1 Comment

thenakulchawla Over a year ago

I haven't done this before, so just a small doubt. the second line, should it have the new function or the old function? df1 = df.where(point_inside_polygon(args)) or df1=df.where(point_inside_polygon_(args))

Collectives™ on Stack Overflow

UDF in pyspark SQL Context sending data as columns

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related