UnicodeEncodeError Python

Question

When I try to find the word's count in UTF-8 string I got the next:

UnicodeEncodeError
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

That's what I do

tr.words_count = (str(tr.transcribe).count(' '))

I need to calculate how many words in UTF-8 text and it seems that my method won't work. Do you have any ideas? Thanks

Amber · Accepted Answer · 2012-01-12 08:56:34Z

4

str(tr.transcribe.decode('utf-8'))

Or better yet,

unicode(tr.transcribe).count(' ')

Or even better (to not get confused if there are multiple spaces in a row),

len(unicode(tr.transcribe).split())

answered Jan 12, 2012 at 8:56

Amber

531k89 gold badges643 silver badges558 bronze badges

Sign up to request clarification or add additional context in comments.

1 Answer 1