I need a udf function to input array column of dataframe and perform equality check of two string elements in it. My dataframe has a schema like this.
| ID | date | options |
|---|---|---|
| 1 | 2021-01-06 | ['red', 'green'] |
| 2 | 2021-01-07 | ['Blue', 'Blue'] |
| 3 | 2021-01-08 | ['Blue', 'Yellow'] |
| 4 | 2021-01-09 | nan |
I have tried this :
def equality_check(options: list):
try:
if options[0] == options[1]:
return 1
else:
return 0
except:
return -1
equality_udf = f.udf(equality_check, t.IntegerType())
But it was throwing out of index error. I am confident that options column is array of strings. the expectation is this:
| ID | date | options | equality_check |
|---|---|---|---|
| 1 | 2021-01-06 | ['red', 'green'] | 0 |
| 2 | 2021-01-07 | ['Blue', 'Blue'] | 1 |
| 3 | 2021-01-08 | ['Blue', 'Yellow'] | 0 |
| 4 | 2021-01-09 | nan | -1 |