0

I'm writing some code using psycopg2 to connect to a PostGreSQL database.

I have a lot of different data types that I want to write to different tables in my PostGreSQL database. I am trying to write a function that can write to each of the tables based on a single variable passed in the function and I want to write more than 1 row at a time to optimize my query. Luckily PostGreSQL allows me to do that: PostGreSQL Insert:

INSERT INTO films (code, title, did, date_prod, kind) VALUES
('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

I have run into a problem that I was hoping someone could help me with.

I need to create a string:

string1 = (value11, value21, value31), (value12, value22, value32)

The string1 variable will be created by using a dictionary with values. So far I have been able to create a tuple that is close to the structure I want. I have a list of dictionaries. The list is called rows:

string1 = tuple([tuple([value for value in row.values()]) for row in rows])

To test it I have created the following small rows variable:

rows = [{'id': 1, 'test1': 'something', 'test2': 123},
        {'id': 2, 'test1': 'somethingelse', 'test2': 321}]

When rows is passed through the above piece of code string1 becomes as follows:

((1, 'something', 123), (2, 'somethingelse', 321))

As seen with string1 I just need to remove the outmost parenthesis and make it a string for it to be as I need it. So far I don't know how this is done. So my question to you is: "How do I format string1 to have my required format?"

3 Answers 3

2

execute_values makes it much easier. Pass the dict sequence in instead of a values sequence:

import psycopg2, psycopg2.extras

rows = [
    {'id': 1, 'test1': 'something', 'test2': 123},
    {'id': 2, 'test1': 'somethingelse', 'test2': 321}
]

conn = psycopg2.connect(database='cpn')
cursor = conn.cursor()

insert_query = 'insert into t (id, test1, test2) values %s'
psycopg2.extras.execute_values (
    cursor, insert_query, rows,
    template='(%(id)s, %(test1)s, %(test2)s)',
    page_size=100
)

And the values are inserted:

table t;
 id |     test1     | test2 
----+---------------+-------
  1 | something     |   123
  2 | somethingelse |   321

To have the number of affected rows use a CTE:

insert_query = '''
    with i as (
        insert into t (id, test1, test2) values %s
        returning *
    )
    select count(*) from i
'''
psycopg2.extras.execute_values (
    cursor, insert_query, rows,
    template='(%(id)s, %(test1)s, %(test2)s)',
    page_size=100
)
row_count = cursor.fetchone()[0]
Sign up to request clarification or add additional context in comments.

6 Comments

Does execute_values support Python3?
@Zeliax I tested it in Python3. execute_values is new in Psycopg 2.7. Check your version: psycopg2.__version__
Thanks. I just updated it. I was on 2.6.2. Will check if this method does the trick. Do I need to execute and do a conn.commit() after the last line in your code snippet?
Lastly, what does the page_size=100 mean? Is that how many are inserted in each "pass-through"?
@Zeliax Yes, if there are more than that another insert will be executed.
|
2

With little modification you can achieve this. change your piece of cod as follows

','.join([tuple([value for value in row.values()]).__repr__() for row in rows])

current output is

tuple of tuple

(('something', 123, 1), ('somethingelse', 321, 2))

After changes output will be

in string format as you want

"('something', 123, 1),('somethingelse', 321, 2)"

2 Comments

Thanks. This is just what I needed. I just couldn't wrap my head around it.
sorry my first language is not English! guessing answer is useful :)
1

The solution that you described is not so well because potentially it may harm your database – that solution does not care about escaping string, etc. So SQL injection is possible. Fortunately, psycopg (and psycopg2) has cursor's methods execute and mogrify that will properly do all this work for you:

import contextlib

with contextlib.closing(db_connection.cursor()) as cursor:
    values = [cursor.mogrify('(%(id)s, %(test1)s, %(test2)s)', row) for row in rows]
    query = 'INSERT INTO films (id, test1, test2) VALUES {0};'.format(', '.join(values))

For python 3:

import contextlib

with contextlib.closing(db_connection.cursor()) as cursor:
    values = [cursor.mogrify('(%(id)s, %(test1)s, %(test2)s)', row) for row in rows]
    query_bytes = b'INSERT INTO films (id, test1, test2) VALUES ' + b', '.join(values) + b';'

11 Comments

Yeah. My plan was to use the cursor.execute(query, inputs) method to do this, but I ended up having some problems with that. However I can't specifically remember what those problems entailed right now
Sorry for the spam comments. But the method I have should be able to adapt to the kinds of data that I pass. I will be passing a lot of different data of different data types for a lot of different tables, which should all go through this one function.
@Zeliax mogrify can format any data types just with '%s'
I see. I just tried to test your piece of code out and it makes great sense. I couldn't get it to work however. I assume db_connection should be my connection element? I have just named it conn. I get an error when I try to execute it, and that is after I have renamed my connection element. TypeError: sequence item 0: expected str instance, bytes found
It's a psycopg2 connection initd.org/psycopg/docs/connection.html
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.