I have the following pandas code snippet that reads all the values found in a specific column of my .csv file.
sample_names_duplicates = pd.read_csv(infile, sep="\t",
engine="c", usecols=[4],
squeeze=True)
That particualr column of my file contains perhaps 20 values at most (sample names), so it would probably be faster if I could drop the duplicates on the fly instead of storing them and then deleting the duplicates afterwards. Is this possible to delete duplicates as they are found in some way?
If not, is there a way to do this more quickly, without having to make the user explicitly name what the sample names in her file are?