I am attempting to programmatically remove specific columns/fields from a dataframe (anything that starts with _), whether the field is in the root or in a struct, using the dropFields method.
For example, if I had "foo._baz", the syntax would be:
df.withColumn("foo",col("foo").dropFields("_baz"))
Hard coded this works fine. When I try to do this in a loop, generating the strings "foo" and "_baz", I get a type mismatch.
I've got a function that parses the columns to find any starting with _, that's working fine. Here's the (hopefully) relevant bit of my code:
var (baseCol, dropCol) = getColumnNodes(col)
df2 = df2.withColumn(baseCol,col(baseCol).dropFields(dropCol)
That results in:
error: type mismatch;
found : String
required: Int
df2 = df2.withColumn(baseCol,col(baseCol).dropFields(dropCol))
Where am I going wrong? ^