How to handle cell with comma when using pd.readcsv? Error tokenizing data. C error: Expected 1 fields in line 88, saw 2

Ask Question

Asked 1 year, 2 months ago

Modified 1 year, 2 months ago

Viewed 62 times

I'm reading a set of .csv files and adding them to one giant data frame called 'df', but I kept getting this error in some of my files: Error tokenizing data. C error: Expected 1 fields in line 88, saw 2

I actually figured out what was likely causing this: the CSVs I'm reading are the result of a newsletter sign up form, and there is a column where people specified what town they lived in. However, some users put a comma in that (ex. "El Paso, Texas" instead of just "El Paso").

Is there a way to handle this within the .read_csv command that doesn't involve just skipping the line? This error came up ~ a dozen times across the maybe 35 spreadsheets I'm stitching together, so I could theoretically alter the spreadsheets manually, but I'm trying to figure out how I could handle this in the future with new spreadsheets.

For reference, here is my code to add the spreadsheets to the dataframe.

for spreadsheet in os.listdir(path):
    file_name = path + '/' + spreadsheet
 
    if file_name[-3:] == "csv":
        try:
            temp = pd.read_csv(path + '/' + spreadsheet, encoding='utf-16')
            df = pd.concat([df, temp])
        except pd.errors.ParserError as e:
            print('Something went wrong with' + file_name + f"error: {e}")
    else: continue

asked Sep 29, 2024 at 18:58

gracemcmc

32 bronze badges

Is this a file with one field per line? If so, could you ignore commas entirely? stackoverflow.com/questions/58192599/…

Nick ODell
– Nick ODell

2024-09-29 19:05:54 +00:00
Commented Sep 29, 2024 at 19:05
A field with comma should be in quotes. If not, it's not a valid CSV. You should fix the code that creates the CSV.

Barmar
– Barmar

2024-09-29 21:22:39 +00:00
Commented Sep 29, 2024 at 21:22
@Barmar couldn't agree with you more but that's out of my hands — these are CSVs generated by Meta with a list of emails from a Facebook ad for a newsletter

gracemcmc
– gracemcmc

2024-09-30 00:01:56 +00:00
Commented Sep 30, 2024 at 0:01
Can you try to read the csv file using python csv module instead of pandas read_csv? Something like this? import csv with open('some.csv') as f: reader = csv.reader(f) for row in reader: print(row)

XXavier
– XXavier

2024-09-30 00:26:55 +00:00
Commented Sep 30, 2024 at 0:26
1

If you can have unquoted commas, there's no reliable way to know which commas are field separators and which should be included in the values. Unless there's only one field that can have commas in it, you're screwed.

Barmar
– Barmar

2024-09-30 15:03:30 +00:00
Commented Sep 30, 2024 at 15:03

| Show 3 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to handle cell with comma when using pd.readcsv? Error tokenizing data. C error: Expected 1 fields in line 88, saw 2

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked