0

I'm reading a set of .csv files and adding them to one giant data frame called 'df', but I kept getting this error in some of my files: Error tokenizing data. C error: Expected 1 fields in line 88, saw 2

I actually figured out what was likely causing this: the CSVs I'm reading are the result of a newsletter sign up form, and there is a column where people specified what town they lived in. However, some users put a comma in that (ex. "El Paso, Texas" instead of just "El Paso").

Is there a way to handle this within the .read_csv command that doesn't involve just skipping the line? This error came up ~ a dozen times across the maybe 35 spreadsheets I'm stitching together, so I could theoretically alter the spreadsheets manually, but I'm trying to figure out how I could handle this in the future with new spreadsheets.

For reference, here is my code to add the spreadsheets to the dataframe.

for spreadsheet in os.listdir(path):
    file_name = path + '/' + spreadsheet
 
    if file_name[-3:] == "csv":
        try:
            temp = pd.read_csv(path + '/' + spreadsheet, encoding='utf-16')
            df = pd.concat([df, temp])
        except pd.errors.ParserError as e:
            print('Something went wrong with' + file_name + f"error: {e}")
    else: continue 
8
  • Is this a file with one field per line? If so, could you ignore commas entirely? stackoverflow.com/questions/58192599/… Commented Sep 29, 2024 at 19:05
  • A field with comma should be in quotes. If not, it's not a valid CSV. You should fix the code that creates the CSV. Commented Sep 29, 2024 at 21:22
  • @Barmar couldn't agree with you more but that's out of my hands — these are CSVs generated by Meta with a list of emails from a Facebook ad for a newsletter Commented Sep 30, 2024 at 0:01
  • Can you try to read the csv file using python csv module instead of pandas read_csv? Something like this? import csv with open('some.csv') as f: reader = csv.reader(f) for row in reader: print(row) Commented Sep 30, 2024 at 0:26
  • 1
    If you can have unquoted commas, there's no reliable way to know which commas are field separators and which should be included in the values. Unless there's only one field that can have commas in it, you're screwed. Commented Sep 30, 2024 at 15:03

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.