1

Would you know by any chance how to get rid on the bytes identifier in front of a string in a Python's list, perhaps there is some global setting that can be amended?

I retrieve a query from the Postgres 9.3, and create a list form that query. It looks like Python 3.3 interprets records in columns that are of type char(4) as if the they are bytes strings, for example:

Funds[1][1]
b'FND3'
Funds[1][1].__class__
<class 'bytes'>

So the implication is:

Funds[1][1]=='FND3'
False

I have some control over that database so I could change the column type to varchar(4), and it works well:

Funds[1][1]=='FND3'
True

But this is only a temporary solution. The little b makes my life a nightmare for the last two days ;), and I would appreciate your help with that problem.

Thanks and Regards Peter

2
  • The b isn't part of the string, any more than the quotes around it are; they're just part of the representation when you print the string out. So, you're chasing the wrong problem, one that doesn't exist. Commented Sep 26, 2013 at 1:12
  • Also, you probably want to read the Unicode HOWTO to understand why we have separate bytes and str types, etc. Commented Sep 26, 2013 at 1:25

2 Answers 2

1

You have to either manually implement __str__/__repr__ or, if you're willing to take the risk, do some sort of Regex-replace over the string.

Example __repr__:

def stringify(lst):
    return "[{}]".format(", ".join(repr(x)[1:] if isinstance(x, bytes) else repr(x) for x in lst))
Sign up to request clarification or add additional context in comments.

Comments

1

The b isn't part of the string, any more than the quotes around it are; they're just part of the representation when you print the string out. So, you're chasing the wrong problem, one that doesn't exist.

The problem is that the byte string b'FND3' is not the same thing as the string 'FND3'. In this particular example, that may seem silly, but if you might ever have any non-ASCII characters anywhere, it stops being silly.

For example, the string 'é' is the same as the byte string b'\xe9' in Latin-1, and it's also the same as the byte string b'\xce\xa9' in UTF-8. And of course b'\xce\a9' is the same as the string 'é' in Latin-1.

So, you have to be explicit about what encoding you're using:

Funds[1][1].decode('utf-8')=='FND3'

But why is PostgreSQL returning you byte strings? Well, that's what a char column is. It's up to the Python bindings to decide what to do with them. And without knowing which of the multiple PostgreSQL bindings you're using, and which version, it's impossible to tell you what to do. But, for example, in recent-ish psycopg, you just have to set an encoding in the connection (e.g., conn.set_client_encoding('UTF-8'); in older versions you had to register a standard typecaster and do some more stuff; etc.; in py-postgresql you have to register lambda s: s.decode('utf-8'); etc.

1 Comment

Problem is solved now, thank you All for your quick responses and great help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.