I am trying to read how strings work in Python and am having a tough time deciphering various functionalities. Here's what I understand. Hoping to get corrections and new perspectives as to how to remember these nuances.
Firstly, I know that Unicode evolved to accommodate multiple languages and accents across the world. But how does python store strings? If I define
s = 'hello'what is the encoding in which the stringsis stored? Is it Unicode? Or does it store in plain bytes? On doingtype(s)I got the answer as<type 'str'>. However, when I didus = unicode(s),uswas of the type<type 'unicode'>. Isusastrtype or is there actually aunicodetype in python?Also, I know that to store space, I know that we encode strings as bytes using
encode()function. So supposebs = s.encode('utf-8', errors='ignore')will return a bytes object. So, now when I am writingbsto a file, should I open the file inwbmode? I have seen that if opened inwmode, it stores the string in the file asb"<content in s>".What does decode() function do?(I know, the question is too open-ended.) Is it like, we apply this on a bytes object and this transforms the string into our chosen encoding? Or does it always convert it back to an Unicode sequence? Can any other insights be drawn from the following lines?
>>> s = 'hello'
>>> bobj = bytes(s, 'utf-8')
>>> bobj
'hello'
>>> type(bobj)
<type 'str'>
>>> bobj.decode('ascii')
u'hello'
>>> us = bobj.decode('ascii')
>>> type(us)
<type 'str'>
- How does
str(object)work? I read that it will try to execute the str() function in the object description. But how differently does this function act on say Unicode strings and regular byte-coded strings?
Thanks in advance.
us = unicode(s): you mean in python 2, sinceunicodehas been removed in python 3...type(us)gives<class 'str'>and there's nounicodetype.encode/decodeyourself, or you can pass anencodingand have the library do it for you. But either way some conversion is necessary.