2

If given a binary string in python like

bstring = b'hello'

why does bstring[0] return the ascii code for the char 'h' (104) and not the binary char b'h' or b'\x68'?

It's probably also good to note that b'h' == 104 returns False (this cost me about 2 hours of debugging, so I'm a little annoyed)

1
  • 4
    It is explained here in the paragraph that starts with "While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers..." Commented Aug 19, 2022 at 19:24

2 Answers 2

3

Because bytes are not characters.

It returns the value of the byte (as integer) that is sliced.

If you take 'hello', this is quite simple: 5 ASCII characters -> 5 bytes:

b'hello' == 'hello'.encode('utf-8')
# True

len('hello'.encode('utf-8'))
# 5

If you were to use non-ASCII characters, those could be encoded on several bytes and slicing could give you only part of a character:

len('å'.encode('utf-8'))
# 2

'å'.encode('utf-8')[0]
# 195

'å'.encode('utf-8')[1]
# 165
Sign up to request clarification or add additional context in comments.

Comments

1

Think of bytes less as a “string” and more of an immutable list (or tuple) with the constraints that all elements be integers in range(256).

So, think of:

>>> bstring = b'hello'
>>> bstring[0]
104

as being equivalent to

>>> btuple = (104, 101, 108, 108, 111)
>>> btuple[0]
104

except with a different sequence type.

It's actually str that behaves weirdly in Python. If you index a str, you don't get a char object like you would in some other languages; you get another str.

>>> string = 'hello'
>>> string[0]
'h'
>>> type(string[0])
<class 'str'>
>>> string[0][0]
'h'
>>> string[0][0][0]
'h'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.