I want to filter a dataframe which has a column with categories (List[String]). I want to ignore all the rows that have a non valid category. They are not valid when they are not in model.getCategories
def checkIncomingData(model: Model, incomingData: DataFrame) : DataFrame = {
val list = model.getCategories.toList
sc.broadcast(list)
incomingData.filter(incomingData("categories").isin(list))
}
Unfortunately my approach does not work because categories is a list, not a single element. Any idea who to make it work?