Bytes string in Python

Question

Would you know by any chance how to get rid on the bytes identifier in front of a string in a Python's list, perhaps there is some global setting that can be amended?

I retrieve a query from the Postgres 9.3, and create a list form that query. It looks like Python 3.3 interprets records in columns that are of type char(4) as if the they are bytes strings, for example:

Funds[1][1]
b'FND3'
Funds[1][1].__class__
<class 'bytes'>

So the implication is:

Funds[1][1]=='FND3'
False

I have some control over that database so I could change the column type to varchar(4), and it works well:

Funds[1][1]=='FND3'
True

But this is only a temporary solution. The little b makes my life a nightmare for the last two days ;), and I would appreciate your help with that problem.

Thanks and Regards Peter

The b isn't part of the string, any more than the quotes around it are; they're just part of the representation when you print the string out. So, you're chasing the wrong problem, one that doesn't exist. — abarnert
– abarnert, Commented Sep 26, 2013 at 1:12
Also, you probably want to read the Unicode HOWTO to understand why we have separate bytes and str types, etc. — abarnert
– abarnert, Commented Sep 26, 2013 at 1:25

Veedrac · Accepted Answer · 2013-09-26 01:06:21Z

1

You have to either manually implement __str__/__repr__ or, if you're willing to take the risk, do some sort of Regex-replace over the string.

Example __repr__:

def stringify(lst):
    return "[{}]".format(", ".join(repr(x)[1:] if isinstance(x, bytes) else repr(x) for x in lst))

answered Sep 26, 2013 at 1:06

Veedrac

60.7k15 gold badges120 silver badges177 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

abarnert · Accepted Answer · 2013-09-26 01:24:07Z

1

The b isn't part of the string, any more than the quotes around it are; they're just part of the representation when you print the string out. So, you're chasing the wrong problem, one that doesn't exist.

The problem is that the byte string b'FND3' is not the same thing as the string 'FND3'. In this particular example, that may seem silly, but if you might ever have any non-ASCII characters anywhere, it stops being silly.

For example, the string 'é' is the same as the byte string b'\xe9' in Latin-1, and it's also the same as the byte string b'\xce\xa9' in UTF-8. And of course b'\xce\a9' is the same as the string 'Ã©' in Latin-1.

So, you have to be explicit about what encoding you're using:

Funds[1][1].decode('utf-8')=='FND3'

But why is PostgreSQL returning you byte strings? Well, that's what a char column is. It's up to the Python bindings to decide what to do with them. And without knowing which of the multiple PostgreSQL bindings you're using, and which version, it's impossible to tell you what to do. But, for example, in recent-ish psycopg, you just have to set an encoding in the connection (e.g., conn.set_client_encoding('UTF-8'); in older versions you had to register a standard typecaster and do some more stuff; etc.; in py-postgresql you have to register lambda s: s.decode('utf-8'); etc.

edited Sep 26, 2013 at 1:24

answered Sep 26, 2013 at 1:17

abarnert

368k54 gold badges626 silver badges691 bronze badges

1 Comment

Peter T251 Over a year ago

Problem is solved now, thank you All for your quick responses and great help!

Collectives™ on Stack Overflow

Bytes string in Python

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related