I have a 17gb tab separated file and I get the above error when using python/pandas
I am doing the following:
data = pd.read_csv('/tmp/testdata.tsv',sep='\t')
I have also tried adding encoding='utf8' and also tried read_table and various flags, including low_memory=True, but I always get the same error at the same line.
I ran the following on the file:
awk -F"\t" 'FNR==1025974 {print NF}' /tmp/testdata.tsv
An it returns 281 for the number of fields so awk is telling me that line has the correct 281 columns, but read_csv is telling me I have 331.
I also tried the above awk on line 1025973 and 1025975, just to be sure something wasn't relative to zero and they both come back as 281 fields.
What am I missing here?