I have a PySpark UDF which when I try to apply to each row for one of the df columns and get a new column, I get a [Ljava.lang.Object;@7e44638d (different value after the @ for each row)
Please see the udf below:
def getLocCoordinates(property_address):
url = "https://maps.googleapis.com/maps/api/geocode/json"
querystring = {f"address":property_address},"key":"THE_API_KEY"}
response = requests.get(url, params=querystring)
response_json = json.loads(response.text)
for adr in response_json['results']:
geometry = adr['geometry']
coor = geometry['location']
lat = coor['lat']
lng = coor['lng']
coors = lat, lng
return coors
getCoorsUDF = udf(lambda x:getLocCoordinates(x))
df = df.withColumn("AddressCoordinates", getCoorsUDF(col("FullAddress") ) )
I tried:
getCoorsUDF = udf(getLocCoordinates, FloatType()) --> returns NULL for each row of the newly create "AddressCoordinates" column.
getCoorsUDF = udf(getLocCoordinates, StringType()) --> returns [Ljava.lang.Object;@
getCoorsUDF = udf(getLocCoordinates) --> returns [Ljava.lang.Object;@
The result looks like so:
| Ref Num | FullAddress | AddressCoordinates |
|---|---|---|
| 1234 | Some Address | [Ljava.lang.Object;@ |
This gets returned for each row in the dataframe.
Initially I was using the function in a Python notebook and it was working fine, lat and lng was returning for each adress. However, I had to move this to PySpark and I am hitting a brick wall here.
returnis insidefor-loop and it finish work after first element.