0

I am trying to import the data and facing issue with the encoding.

Some times utf-8 works but not latin-1, similarly it depends on the type of data coming.

Different encoding used - latin-1, utf-8, windows-1252

Code -

pd.read_csv(dir_in+file_note,sep='|',
              low_memory=False,header=0,
              error_bad_lines=False,
              encoding = "windows-1252",
              warn_bad_lines=False)

Please guide on how to make the code dynamic so that if one gives error it should try the other one.

1) Priority one will be utf-8

2) Priority two will be latin-1

3) Priority three will be windows-1252

2
  • would you upload the csv( or part of it), and add link Commented Apr 1, 2020 at 18:56
  • Try loading the csv with all 3 of the encodings you mentioned in that order & catch any errors that occur Commented Apr 1, 2020 at 18:59

2 Answers 2

1

Not very beautiful, but it works.

try:
  pd.read_csv(dir_in+file_note,sep='|',
                low_memory=False,header=0,
                error_bad_lines=False,
                encoding = "utf-8",
                warn_bad_lines=False)
except:
  try:
    pd.read_csv(dir_in+file_note,sep='|',
                  low_memory=False,header=0,
                  error_bad_lines=False,
                  encoding = "latin-1",
                  warn_bad_lines=False)
  except:
    pd.read_csv(dir_in+file_note,sep='|',
                  low_memory=False,header=0,
                  error_bad_lines=False,
                  encoding = "windows-1252",
                  warn_bad_lines=False)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, Exactly what I was looking for. Really appreciate!
0

The question was asked on the github dev page:

Is there a way pandas can read a csv file and find out the encoding automatically . Or is there a fix to this? Maybe to be a feature (if it does not exist yet) in a future release?

One contributor says:

This seems out of scope for pandas. I'd recommend using a library like chardet to determine the encoding ahead of time.

2 Comments

Actually, I created the data using SPSS stream and choose the output format as utf-8. But when imported the data in notebook, it gives error. Will definitely explore chardet, thank you.
What kind of error exactly ? Please be more specific. If you always export the files in a specific encoding there should be no surprise and no need to guess the encoding.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.