I want to create an array which is conditionally populated based off of existing column and sometimes I want it to contain None. Here's some example code:
from pyspark.sql import Row
from pyspark.sql import SparkSession
from pyspark.sql.functions import when, array, lit
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([
Row(ID=1),
Row(ID=2),
Row(ID=2),
Row(ID=1)
])
value_lit = 0.45
size = 10
df = df.withColumn("TEST",when(df["ID"] == 2,array([None for i in range(size)])).otherwise(array([lit(value_lit) for i in range(size)])))
df.show(truncate=False)
And here's the error I'm getting:
TypeError: Invalid argument, not a string or column: None of type <type 'NoneType'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.
I know it isn't a string or column, I don't see why it has to be?
lit: doesn't work.array: I'm not sure how to use array in this context.struct: probably the way to go but I'm not sure how to use it here. Perhaps I have to set an option to allow the new column to contain None values?create_map: I'm not creating a key:value map so I'm sure this is not the correct one to use.

df = df.withColumn("TEST", when(df["ID"] == 2, array([lit(None) for i in range(size)])).otherwise( array([lit(value_lit) for i in range(size)])))Traceback (most recent call last): df.show(truncate=False) py4j.protocol.Py4JJavaError: An error occurred while calling o128.showString. : scala.MatchError: NullType (of class org.apache.spark.sql.types.NullType$) at org.apache.spark.sql.catalyst.expressions.Cast.castToDouble(Cast.scala:531)