I am trying evaluate each field in the if statement below.
However, I am running into the following error: Method col([class java.util.ArrayList]) does not exist.
What I am trying to achieve: I am trying to evaluate two fields in my dataframe - Name and Surname, in a Python function. In these fields, I have NULL values. For each field, I would like to identify if NULL values exist.
I am loading various datasets with fields that should be evaluated from each set. I would like to pass these fields into the function to check if NULL values exist.
def identifyNull(Field):
Field = ['Name', 'Surname'] - this is an example of what I would like to pass to my function.
for x in Field:
if df.select().filter(col(Field).isNull()).count() > 0:
print(Field)
else:
print('False')
df = the dataframe name for the data I am reading.
df structure:
| Name | Surname |
|---|---|
| John | Doe |
| NULL | James |
| Lisa | NULL |
Please note: I am completely new to Python and Spark.
if df.select().filter(col( x).isNull()).count() > 0:and thenprint(x)? (Otherwise what would be the point interating yourFieldlist?xis your field. Yourforloop is saying "Take each item in this list calledfieldsand call that itemx". Many programming languages use the syntaxFor Each x in fieldswhich is a little clearer. Python just drops theEachso it isn't so verbose.