Good morning. I'm working on a project and am having issues reading a CSV on a file with more than 36,000 rows. When I run it normally:
df = pd.read_csv('File.csv', encoding='ISO-8859-1', dtype='str')
#FYI, this is the only encoding to get anywhere. Plus, I downloaded this report from Salesforce, and I ensured the encodings match.
ParserError: Error tokenizing data. C error: EOF inside string starting at row 21276
To test this, I run this cell:
import csv
file_path = 'File.csv'
with open(file_path, 'r', encoding='ISO-8859-1') as file:
reader = csv.reader(file)
for i, line in enumerate(reader):
if i == 21275: # since Python is zero-indexed
print(f'Line 21276: {line}')
x = line
print(x)
print(len(x))
break
Here is the output (actual info replaced with placeholder).
Line 21276: ['FIRST LAST', 'Company', '12345', 'Place', 'Place', 'Place', '123456', '123456', '1_2_34', '', 'Location', '', 'Thing', 'None', '', '12.345', '0.000', 'Something', 'Status', '1/2/2034 12:00 AM', '3/45/6789', '', '', '', '', '']
26
I've opened up the CSV file on Excel and the columns run from A to Z, so that's 26 columns. Everything lines up.
I've tried this and got a different issue:
df = pd.read_csv('File.csv', encoding='ISO-8859-1', dtype='str', quoting=csv.QUOTE_NONE)
ParserError: Error tokenizing data. C error: Expected 27 fields in line 535, saw 28
I diagnosed this similarly.
with open(file_path, 'r', encoding='ISO-8859-1') as file:
for i, line in enumerate(file):
if i == 534: # since Python is zero-indexed
print(f'Line 535: {line}')
x = line
print(len(x.split(',')))
break
Line 535: "FIRST LAST","Company","12345","Place","Place","Place","12345","12345","3_4_56","","Place","","Thing","None","","0000.000","0.000","Something","Word","2/34/5678 12:00 PM","","","","0000.00","",""
28
It looks like adding the QUOTE_NONE progresses troubleshooting, but adds columns. I checked this row in Excel and verified there are only 26 columns.
Any help is appreciated. Thank you.
engine='python'? Not sure how it's different to the default parsing engine, but it seems to be able to solve similar issues.