0

I am attempting to programmatically remove specific columns/fields from a dataframe (anything that starts with _), whether the field is in the root or in a struct, using the dropFields method.

For example, if I had "foo._baz", the syntax would be:

df.withColumn("foo",col("foo").dropFields("_baz"))

Hard coded this works fine. When I try to do this in a loop, generating the strings "foo" and "_baz", I get a type mismatch.

I've got a function that parses the columns to find any starting with _, that's working fine. Here's the (hopefully) relevant bit of my code:

var (baseCol, dropCol) = getColumnNodes(col)
df2 = df2.withColumn(baseCol,col(baseCol).dropFields(dropCol)

That results in:

error: type mismatch;
 found   : String
 required: Int
    df2 = df2.withColumn(baseCol,col(baseCol).dropFields(dropCol))

Where am I going wrong? ^

1 Answer 1

1

I've got this resolved, will post this here in case it helps anyone. This bit here was the problem:

col(baseCol)

I don't understand exactly why this doesn't work, but it doesn't. Using the $ notation like so does work:

 df2 = df2.withColumn(baseCol,( $"$baseCol".dropFields(dropCol)))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.