How does Python convert bytes into float?

Question

I have the following code snippet:

#!/usr/bin/env python3

print(float(b'5'))

Which prints 5.0 with no error (on Linux with utf-8 encoding). I'm very surprised that it doesn't give an error since Python is not supposed to know what encoding is used for the bytes object.

Any insight?

Have you rad the documentation? and docs.python.org/3.6/c-api/buffer.html#bufferobjects — Kasravnd
– Kasravnd, Commented May 18, 2018 at 10:07
@Kasramvd: the documentation for float() states it accepts a str, a number, or a type that implements __float__. bytes doesn't implement __float__. — Martijn Pieters
– Martijn Pieters, Commented May 18, 2018 at 10:13
@MartijnPieters Here it's mentioned that If the argument is a string, it should contain a decimal number, optionally preceded by a sign, and optionally embedded in whitespace. doesn't b'5' follow that rule? Although it should have been specified clearly in the documentation. — Kasravnd
– Kasravnd, Commented May 18, 2018 at 10:17
Fair question, since not all encodings are supersets of ASCII. — PM 2Ring
– PM 2Ring, Commented May 18, 2018 at 10:17
@Kasramvd: no, it doesn't. The bytes type is not considered a string. — Martijn Pieters
– Martijn Pieters, Commented May 18, 2018 at 10:24

Martijn Pieters · Accepted Answer · 2018-05-18 10:48:35Z

13

When passed a bytes object, float() treats the contents of the object as ASCII bytes. That's sufficient here, as the conversion from string to float only accepts ASCII digits and letters, plus . and _ anyway (the only non-ASCII codepoints that would be permitted are whitespace codepoints), and this is analogous to the way int() treats bytes input.

Under the hood, the implementation does this:

because the input is not a string, PyNumber_Float() is called on the object (for str objects the code jumps straight to PyFloat_FromString).
PyNumber_Float() checks for a __float__ method, but if that's not available, it calls PyFloat_FromString()
PyFloat_FromString() accepts not only str objects, but any object implementing the buffer protocol. The String name is a Python 2 holdover, the Python 3 str type is called Unicode in the C implementation.
bytes objects implement the buffer protocol, and the PyBytes_AS_STRING macro is used to access the internal C buffer holding the bytes.
A combination of two internal functions named _Py_string_to_number_with_underscores() and float_from_string_inner() is then used to parse ASCII bytes into a floating point value.

For actual str strings, the CPython implementation actually converts any non-ASCII string into a sequence of ASCII bytes by only looking at ASCII codepoints in the input value, and converting any non-ASCII whitespace character to ascii 0x20 spaces, to then use the same _Py_string_to_number_with_underscores() / float_from_string_inner() combo.

I see this as a bug in the documentation and have filed issue with the Python project to have it updated.

edited May 18, 2018 at 10:48

answered May 18, 2018 at 10:17

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sraw Over a year ago

I know there won't be a thing about python that this guy doesn't know.

static_rtti Over a year ago

Thanks for the great answer. So, just to be clear, this will fail with certain encodings, such as UTF-16?

Martijn Pieters Over a year ago

@static_rtti: absolutely, because the \x00 bytes won't be accepted. The bytes must be ASCII only, and fit the float() string interpretation rules.

Collectives™ on Stack Overflow

How does Python convert bytes into float?

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related