I have this code part of a function that replace badly encoded foreign characters from a string :
s = "String from an old database with weird mixed encodings"
s = str(bytes(odbc_str.strip(), 'cp1252'))
s = s.replace('\\x82', 'é')
s = s.replace('\\x8a', 'è')
(...)
print(s)
# b"String from an old database with weird mixed encodings"
I need here a "real" string, not bytes. But whend i want to decode them, i have an exception :
s = "String from an old database with weird mixed encodings"
s = str(bytes(odbc_str.strip(), 'cp1252'))
s = s.replace('\\x82', 'é')
s = s.replace('\\x8a', 'è')
(...)
print(s.decode("utf-8"))
# AttributeError: 'str' object has no attribute 'decode'
- Do you know why s is bytes here ?
- Why can't i decode it to a real string ?
- Do you know how to do it the clean way ? (today i return s[2:][:-1]. Working but very ugly, and i would like to understand this behavior)
Thanks in advance !
EDIT :
pypyodbc in python3 use all unicode by default. That confused me. On connect, you can tell him to use ANSI.
con_odbc = pypyodbc.connect("DSN=GP", False, False, 0, False)
Then, i can convert the returned stuffs into cp850, which is the initial codepage of the database.
str(odbc_str, "cp850", "replace")
No more need to manualy replace each special character. Thank you very much pepr
str.decodeno longer exists in 3.x. See docs.python.org/3/howto/unicode.html for dealing with strings and bytes in 3.xdecodeis for converting bytes to abstract characters that compose the string. The string in Python 3 is expected to contain only valid characters. This is the reason for not having.decode-- there are no bytes in a Python 3 string.