0

I am reading scandinavian language websites with a web-crawler - and wish to insert them into my PostgreSQL database.

Originally I tried to encode my PSQL DB as utf-8, then manually tried to insert the characters that would be of a problem like this:

Insert into name (surname) VALUES ('Børre');

This was done in the windows PSQL shell.

This gave me the following error: ERROR: invalid byte sequence for encoding "UTF8": 0x9b. So after doing some googling I changed the client encoding to latin1. Now that statement was successfull. The server encoding is still utf8.

When I do the same insert through my python script the name appears in my database as B°rre. If I change back the encoding of client to utf8, I also get entries with wrong special characters.

My python script is utf8 encoded, but prints the name correct.

Insert statement:

con = psycopg2.connect(*database details*)

print("Opened database successfully")

cur = con.cursor()

#INSERT NAME

query = "INSERT INTO name (surname) VALUES (%s) RETURNING id"

data = ('børre')

cur.execute(query,data)

As previously stated, print(personObject.surname) gives 'Børre'

If I try the following:

query = "INSERT INTO name (surname) VALUES (%s) RETURNING id"

data = ('børre'.encode('utf-8'))

cur.execute(query,data)

I get the following in my database:

\x62c383c2b8727265

7
  • 1
    Which version of Python? Commented Dec 30, 2016 at 21:32
  • 1
    Can you post your stack trace ? Commented Dec 30, 2016 at 21:32
  • 1
    Why don't you use UTF-8 encoding? Today, there exists no reason not to use it. Commented Dec 30, 2016 at 21:39
  • Python version is 3.x. The reason why I changed from utf-8 is stated in the start of the question. I will update the question with stack trace asap. Commented Dec 30, 2016 at 21:44
  • The stack trace does not output anything, I get no error in python. @LaurentLAPORTE Commented Dec 30, 2016 at 21:53

2 Answers 2

1

psycopg2 doesn't understand postgresql queries it just converts the arguments given into their postgresql representation

if you give it an array of bytes to will convert it to a postgresql BYTEA literal,

data = ('børre'.encode('utf-8')) gets you a bytes.

so, don't do that, use a string.

The code fragment you have at the top should work.

In the error I see ø encoded as hex c383c2b8, that hex translates to UTF8 as two charactersà and ¸. It looks to me like python thinks your script is not wtitten is UTF8, but instead some other codepage.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your answer! Any suggestion on how I will get 'børre' to be 'børre' in the PSQL database as well?
0

using client_encoding key words
eg: conn=psycopg2.connect("dbname='foo' user='dbuser' password='mypass' client_encoding='utf8'")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.