1

New to pyspark. Just trying to simply loop over columns that exist in a variable list. This is what I've tried, but doesn't work.

column_list = ['colA','colB','colC']
for col in df:
   if col in column_list:
      df = df.withColumn(...)
   else:
      pass

It's definitely an issue with the loop. I feel like I'm missing something really simple here. I performed the df operation independently on each column and it ran clean ie.

df = df.withColumn(...'colA').withColumn(...'colB').withColumn(...'colC')
3
  • The error points to a bool issue in the logic which goes inside the withColumn. Do update question with the logic for the columns. Commented Dec 15, 2021 at 18:58
  • How can that be? If the alternative solution works then it can't be the logic, right? All I'm doing is literally substituting col for the actual field names in the loop Commented Dec 15, 2021 at 18:59
  • Can you provide the full code that produces the error? ie where the ......'s are? Commented Dec 15, 2021 at 19:02

1 Answer 1

2

Use the following snippet

column_list = ['colA','colB','colC']
for col in df.columns:
   if col in column_list:
      df = df.withColumn(....)
   else:
      pass
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.