4

The following is a sample of my Spark DataFrame with the printSchema below it:

+--------------------+---+------+------+--------------------+
|           device_id|age|gender| group|                apps|
+--------------------+---+------+------+--------------------+
|-9073325454084204615| 24|     M|M23-26|                null|
|-8965335561582270637| 28|     F|F27-28|[1.0,1.0,1.0,1.0,...|
|-8958861370644389191| 21|     M|  M22-|[4.0,0.0,0.0,0.0,...|
|-8956021912595401048| 21|     M|  M22-|                null|
|-8910497777165914301| 25|     F|F24-26|                null|
+--------------------+---+------+------+--------------------+
only showing top 5 rows

root
 |-- device_id: long (nullable = true)
 |-- age: integer (nullle = true)
 |-- gender: string (nullable = true)
 |-- group: string (nullable = true)
 |-- apps: vector (nullable = true)

I'm trying to fill the null in the 'apps' column with np.zeros(19237). However When I execute

df.fillna({'apps': np.zeros(19237)}))

I get an error

Py4JJavaError: An error occurred while calling o562.fill.
: java.lang.IllegalArgumentException: Unsupported value type java.util.ArrayList

Or if I try

df.fillna({'apps': DenseVector(np.zeros(19237)})))

I get an error

AttributeError: 'numpy.ndarray' object has no attribute '_get_object_id'

Any ideas?

1 Answer 1

4

DataFrameNaFunctions support only a subset of native (no UDTs) types, so you'll need an UDF here.

from pyspark.sql.functions import coalesce, col, udf
from pyspark.ml.linalg import Vectors, VectorUDT

def zeros(n):
    def zeros_():
        return Vectors.sparse(n, {})
    return udf(zeros_, VectorUDT())()

Example usage:

df = spark.createDataFrame(
    [(1, Vectors.dense([1, 2, 3])), (2, None)],
    ("device_id", "apps"))

df.withColumn("apps", coalesce(col("apps"), zeros(3))).show()
+---------+-------------+
|device_id|         apps|
+---------+-------------+
|        1|[1.0,2.0,3.0]|
|        2|    (3,[],[])|
+---------+-------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.