Using Python codecs but still getting UnicodeDecodeError

Question

I have a non-English list of rows where each row is a list of strings and ints. I need to write this data to a file and convert all numbers to strings accordingly. The data contents is the following:

[[u'12', u'as', u'ss', u'ge', u'ge', u'm\xfcnze', u'10.0', u'25.2', u'68.05', 1, 2, 0],
[u'13', u'aas', u'sss', u'tge', u'a', u'mat', u'11.0', u'35.7', u'10.1', 1, 1, 1], ...]

The loop breaks on the first list which contains u'm\xfcnze'.

import codecs

with codecs.open("temp.txt", "w", encoding="utf-8") as f:
    for row in data:
        f.write(' '.join([str(r) for r in row]))
        f.write('\n')

The code above fails with UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 38: ordinal not in range(128) error.

Trying r.encode('utf-8') if isinstance(r, str) does not solve this issue, so what am I doing wrong?

What is data? I would like to know data type and structure to be able to help you. — user1785721
– user1785721, Commented Sep 1, 2017 at 18:40

user1785721 · Accepted Answer · 2017-09-01 19:11:02Z

2

This should work:

import codecs

with codecs.open("temp.txt", "w", encoding="utf-8") as f:
    for row in data:
        f.write(' '.join([unicode(r) for r in row]))
        f.write('\n')

I'm using the unicode() function

Note, because Python 3 string data type is string unicode, your code works fine in Python 3 without any modification (no str -> unicode needed)

answered Sep 1, 2017 at 19:11

user1785721

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Using Python codecs but still getting UnicodeDecodeError

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related