1

I'm generating file names from a list pulled out from a postgres DB with Python 2.7.9. In this list there are words with special char. Normally I use ''.join() to record the name and fire it to my loader but I have just one name that want be recognized. the .py is set for utf-8 coding, but the words are in Portuguese, I think latin-1 coding.

from pydub import AudioSegment
from pydub.playback import play
templist = ['+ Orégano','- Búfala','+ Rúcola']
count_ins = (len(templist)-1)
while (count_ins >= 0 ):
    kot_istructions = AudioSegment.from_ogg('/home/effe/voice_orders/Voz/'+"".join(templist[count_ins])+'.ogg')
    count_ins-=1
    play(kot_istructions)

The first two files are loaded:

/home/effe/voice_orders/Voz/+ Orégano.ogg

/home/effe/voice_orders/Voz/- Búfala.ogg

The third should be:

/home/effe/voice_orders/Voz/+ Rúcola.ogg

But python is trying to load

/home/effe/voice_orders/Voz/+ R\xc3\xbacola.ogg

Why just this one? I've tried to use normalize() to remove the accent but since this is a string the method didn't work. Print works well, as db update. Just file name creation doesn't works as expected. Suggestions?

5
  • 1
    Unicode strings require the "u" prefix in Python 2: [u'+ Orégano', u'- Búfala', u'+ Rúcola']. Commented Jun 25, 2015 at 18:25
  • 1
    Please do use iteration instead of manually counting indices. A simple for word in templist suffices. Then, get rid of the join-call, it's only working incidentially here because you only have one argument that is a string - it's not really doing what you think it is. The string representation looks like proper utf-8 encoding, the question is: is your filesystem's encoding utf-8? Commented Jun 25, 2015 at 18:33
  • Have you considered using python 3? The Unicode handling was redone. Commented Jun 25, 2015 at 18:47
  • @dlask : I can't edit the list because is generated in real time and used in various parts of the program. @deets : Sometimes I need to edit "on the fly" an index count, is more quick to edit than for/in cycle. My filesystem (as my db) is set to LANG=pt_BR.UTF-8 . @A.L.Flanagan: I can't. I'm using python with Odoo and I need Python 2.7. Commented Jun 25, 2015 at 19:35
  • 1
    don't put an answer (sentence after "Solved") into the question. Post it as your own answer instead Commented Jun 26, 2015 at 21:06

2 Answers 2

1

It seems the root cause might be that the encoding of these names in inconsisitent within your database.

If you run:

>>> 'R\xc3\xbacola'.decode('utf-8')

You get

u'R\xfacola'

which is in fact a Python unicode, correctly representing the name. So, what should you do? Although it's a really unclean programming style, you could play .encode()/.decode() whackamole, where you try to decode the raw string from your db using utf-8, and failing that, latin-1. It would look something like this:

try:
    clean_unicode = dirty_string.decode('utf-8')
except UnicodeDecodeError:
    clean_unicode = dirty_string.decode('latin-1')

As a general rule, always work with clean unicode objects within your own source, and only convert to an encoding on saving it out. Also, don't let people insert data into a database without specifying the encoding, as that will stop you from having this problem in the first place.

Hope that helps!

Sign up to request clarification or add additional context in comments.

3 Comments

With .encode()/.decode() I get the same error, but with a different encoding: IOError: [Errno 2] No such file or directory: u'/home/effe/voice_orders/Voz/+ R\xfacola.ogg' Meaning your conversion is working indeed but is not the result I need. Again, why just with ú in this position and not in the other case?
If you're looking for a file that already exists, what is the actual encoding in its name? Can you browse to it and see what it's name is? Try running os.listdir('/home/effe/voice_orders/Voz/') and see how it's represented in your system.
os.listdir didn't report anything out of place. Anyway THIS WAS a problem with the file. Deleting the file and creating a new one solve the problem. Please don't ask me why. Thank you to point it out, +1 on your reply because even if was not the case your method enlightened me on how encode/decode works.
0

Solved: Was a problem with the file. Deleting and build it again do the job.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.