I am reading scandinavian language websites with a web-crawler - and wish to insert them into my PostgreSQL database.
Originally I tried to encode my PSQL DB as utf-8, then manually tried to insert the characters that would be of a problem like this:
Insert into name (surname) VALUES ('Børre');
This was done in the windows PSQL shell.
This gave me the following error: ERROR: invalid byte sequence for encoding "UTF8": 0x9b. So after doing some googling I changed the client encoding to latin1. Now that statement was successfull. The server encoding is still utf8.
When I do the same insert through my python script the name appears in my database as B°rre. If I change back the encoding of client to utf8, I also get entries with wrong special characters.
My python script is utf8 encoded, but prints the name correct.
Insert statement:
con = psycopg2.connect(*database details*)
print("Opened database successfully")
cur = con.cursor()
#INSERT NAME
query = "INSERT INTO name (surname) VALUES (%s) RETURNING id"
data = ('børre')
cur.execute(query,data)
As previously stated, print(personObject.surname) gives 'Børre'
If I try the following:
query = "INSERT INTO name (surname) VALUES (%s) RETURNING id"
data = ('børre'.encode('utf-8'))
cur.execute(query,data)
I get the following in my database:
\x62c383c2b8727265