I am relatively new to Spark and I am trying to filter out invalid records from a Spark Dataset. My dataset looks something like this:
| Id | Curr| Col3 |
| 1 | USD | 1111 |
| 2 | CNY | 2222 |
| 3 | USD | 3333 |
| 1 | CNY | 4444 |
In my logic, each Id has a vaild currency. So it will basically be a map of id->currency
val map = Map(1 -> "USD", 2 -> "CNY")
I want to filter out the rows from the dataset that have Id not corresponding to the valid currency code. So after my filter operation, the dataset should look something like this:
| Id | Curr| Col3 |
| 1 | USD | 1111 |
| 2 | CNY | 2222 |
The limitation I have here is that I cannot use a UDF. Can somebody help me in coming up with a filter operation for this?