Python imaplib sometimes returns strings that looks like this:
=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=
What is the name for this notation?
How can I decode (or should I say encode?) it to UTF8?
In short:
>>> from email.header import decode_header
>>> msg = decode_header('=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=')[0][0].decode('utf-8')
>>> msg
'Repertuar wydarze\u0144 z woj. Dolno\u015bl\u0105skie'
My computer doesn't show the polish characters, but they should appear in yours (locales etc.)
Explained:
Use the email.header decoder:
>>> from email.header import decode_header
>>> value = decode_header('=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=')
>>> value
[(b'Repertuar wydarze\xc5\x84 z woj. Dolno\xc5\x9bl\xc4\x85skie', 'utf-8')]
That will return a list with the decoded header, usually containing one tuple with the decoded message and the encoding detected (sometimes more than one pair).
>>> msg, encoding = decode_header('=?utf-8?Q?Repertuar_wydarze=C5=84_z_woj._Dolno=C5=9Bl=C4=85skie?=')[0]
>>> msg
b'Repertuar wydarze\xc5\x84 z woj. Dolno\xc5\x9bl\xc4\x85skie'
>>> encoding
'utf-8'
And finally, if you want msg as a normal utf-8 string, use the bytes decode method:
>>> msg = msg.decode('utf-8')
>>> msg
'Repertuar wydarze\u0144 z woj. Dolno\u015bl\u0105skie'
decode_header should return all the decoded messages. but try it to see if it gives you the wanted resultYou can directly use the bytes decoder instead , here is an example:
result, data = imapSession.uid('search', None, "ALL") #search and return uids
latest_email_uid = data[0].split()[-1] #data[] is a list, using split() to separate them by space and getting the latest one by [-1]
result, data = imapSession.uid('fetch', latest_email_uid, '(BODY.PEEK[])')
raw_email = data[0][1].decode("utf-8") #using utf-8 decoder`