Spark Version: 2.1
Scala Version: 2.11
I have a dataframe with following structure before writing it and store into parquet file. It has lot of other columns but i cut it short to only 2 columns for clarity:
+---+--------------------+
|day| table_row |
+---+--------------------+
| 8|[,129,,,,,J,WENDI...|
| 8|[_DELETE_THIS_,_D...|
| 8|[_DELETE_THIS_,_D...|
...and the schema looks like this:
root
|-- day: long (nullable = true)
|-- table_row: struct (nullable = true)
| |-- DATE: string (nullable = true)
| |-- ADMISSION_NUM: string (nullable = true)
| |-- SOURCE_CODE: string (nullable = true)
etc..
'table_row' has over 100 data elements and i only posted a snippet. During processing i had to create couple of dummy rows with each field populated with "_DELETE_THIS_". For every normal row i have 2 dummy rows. Now i am trying to filter these dummy rows out of the dataframe and write only the valid rows but i am not able to do that using any means. I tried a couple ways but couldnt find a proper solution. Can someone help me this?
Thanks Qubiter