2

I need to add an empty column of StructType to an existing DataFrame.

Tried following:

df = df.withColumn("features", typedLit(StructType(Nil)))

And:

df = df.withColumn("features", lit(new GenericRowWithSchema(Array(), StructType(Nil))))

However, in both of the above cases getting an error as unsupported literal type.

2 Answers 2

1

In a crude way, one can use a user-defined function to add a column with empty rows:

def addEmptyRowColumn(df: DataFrame, newColumnName: String): DataFrame = {
  val addEmptyRowUdf = udf( () =>
    new GenericRowWithSchema(Array(), StructType(Nil)), StructType(Nil))

  df.withColumn(newColumnName, addEmptyRowUdf())
}

df = addEmptyRowColumn(df, "features")
Sign up to request clarification or add additional context in comments.

1 Comment

Though technically this answer is right, as per the spark developer community and spark tuning tips, it is not a good practice to use UDF. I would suggest to consider the above answer than this answer.
1

In a single liner and without UDF: from pyspark.sql import types as T, functions as F

df.withColumn(newColumnName, F.lit(None).cast(T.StructType()))

2 Comments

Whats F and T ?
thank you for the headsup. I updated the sample: from pyspark.sql import types as T, functions as F

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.