I have a Dataframe like this
Studentname Speciality
Alex ["Physics","Math","biology"]
Sam ["Economics","History","Math","Physics"]
Claire ["Political science,Physics"]
I want to find all students who has speciality in [Physics,Math], so the output should have 2 rows Alex,Sam
This is what i have tried
from pyspark.sql.functions import array_contains
from pyspark.sql import functions as F
def student_info():
student_df = spark.read.parquet("s3a://studentdata")
a1=["Physics","Math"]
df=student_df
for a in a1:
df= student_df.filter(array_contains(student_df.Speciality, a))
print(df.count())
student_info()
output:
3
2
Would like to know how to filter array column based on a given subset of array