3

I tried to read my dataset in text file format using pandas. However, some characters are not encoded correctly. I got ??? for apostrophe.

What should I do to encode my file correctly? I've tried

  • encoding = "utf8" but I got UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 2044: unexpected end of data.

  • encoding = "latin1" but this gave me a lot of ???

  • encoding = "ISO-8859-1" or "ISO-8859-2" but this also gave me just like no encoding...

When I open my data in sublime, I got this character ’.

UPDATED: But when I access the entry using loc I got something like \u0102\u02d8\xe2\x82\u0179\xc2\u015, \u0102\u02d8\xe2\x82\u0179\xe2\x84\u02d8

3
  • You need to know what encoding the file is actually in. Where did you get the file? Commented Feb 4, 2015 at 8:12
  • have you tried ISO-8859-2? Commented Feb 4, 2015 at 8:17
  • @AndyHayden Yes, I did Commented Feb 4, 2015 at 15:50

1 Answer 1

2

You may be able to determine the encoding with chardet:

$ pip install chardet

>>> import urllib
>>> rawdata = urllib.urlopen('http://yahoo.co.jp/').read()
>>> import chardet
>>> chardet.detect(rawdata)
{'encoding': 'EUC-JP', 'confidence': 0.99}

The basic usage also suggests how you can use this to infer the encoding from large files e.g. files too large to read into memory - it'll read the file until it's confident enought about the encoding.


According to this answer you should try encoding="ISO-8859-2":

My guess is that your input is encoded as ISO-8859-2 which contains Ă as 0xC3.


Note: Sublime may not infer the encoding correctly either so you have to take it's output with a pinch of salt, it's best to check with your vendor (wherever you're getting the file from) what the actual encoding is...

Sign up to request clarification or add additional context in comments.

1 Comment

charset-normalizer is a recent alternative to charset.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.