I cant read the data from a CSV file into memory because it is too large, i.e. doing pandas.read_csv using pandas won't work.
I only want to get data out based on some column values which should fit into memory. Using a pandas dataframe df that could hypothetically contain the full data from the CSV, I would do
df.loc[(df['column_name'] == 1)
The CSV file does contain a header, and it is ordered so I don't really need to use column_name but the order of that column if I have to.
How can I achieve this? I read a bit about pyspark but I don't know if this is something where it can be useful