How can I bulk load JSON into Postgres using psycopg2?

Question

I've been loading json data from a file like this:

with open("data.json") as jd:
    print("loading json")
    j = json.load(jd)
    print("inserting")
    SendToPostGres(j)

def SendToPostGres(incs):
    length = len(incs)
    processed = 0
    pgParams = {
            'database': 'mydb',
            'user': 'hi',
            'password': '2u',
            'host': 'somedb.com',
            'port': 1111
            }
    conn = psycopg2.connect(**pgParams)
    curs = conn.cursor()

    for i in incs:
        curs.execute("insert into MY_TABLE (data) values (%s)", [Json(i)])
        processed += 1
        conn.commit()
        print("%s processed, %s remaining" % (processed, length+1-processed))

This is highly inefficient. I've tried googling this and looking at other posts, but I can't seem to get the desired effect of: "For each item in my list of json, create a row in my database with the corresponding data stored as a json type in postgres."

Could someone explain to me the most efficent way to do this in bulk?

UPDATE:

Per an answer below, I've tried updating to use the execute_values function from extras. The error I'm receiving now is:

"string index out of range"

Note that I tried changing page size, because I thought that might be related. What I tried didn't work. But it might still be an issue.

def SendToPostGres(incs):
    values = []
    for i in incs:
        values.append(json.dumps(i))

    pgParams = {
            'database': 'MY_DB',
            'user': 'hi',
            'password': '2u',
            'host': 'somedb.com',
            'port': 5432
            }
    conn = psycopg2.connect(**pgParams)
    curs = conn.cursor()

    try:
        psycopg2.extras.execute_values(curs, "insert into incidents (data) values (%s)", values, page_size=len(values))
    except Exception as e:
        raise e
    rows = curs.fetchall()
    curs.close()

John R · Accepted Answer · 2018-01-30 21:19:13Z

2

Use extras.execute_values from psycopg2.

Use '%s' syntax in your query to designate where values should be injected.

This is incredibly fast compared to your current method.

from psycopg2 import extras

def queryPostgresBulk(conn, query, values):

    _query = query
    _values = values
    _conn = conn
    _cur = _conn.cursor()
    try:
        extras.execute_values(_cur, _query, _values, page_size=_values.__len__())
    except Exception, e:
        raise e
    rows = _cur.fetchall()
    _cur.close()

    return rows

Update to OP comment:

Use json.dumps() to convert your list of dicts to a list of ~~strings~~ tuples of json strings, the format expected by the function. Pass it a list of ~~json strings~~ tuples of json strings, rather than dicts representing json objects.

import json

_values = []
for dict in list
    _values.append((json.dumps(dict),))

Or with list comprehension:

_values = [(json.dumps(x),) for x in list]

Also worth pointing out that the data you're loading isn't in valid json format without a single key at the top level.

Update to OP comment again:

You need to supply a list of tuples as values, with the json strings being within that tuple. If the only data you want to inject in values is the json string, then you need to update your for loop building values to:

for i in incs:
    values.append((json.dumps(i),))

Not sure why I'm posting this since you downvoted my correct answers to your two earlier versions of your question...hopefully it will help someone else.

edited Jan 30, 2018 at 21:19

answered Jan 29, 2018 at 23:40

John R

1,51611 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Darkenor Over a year ago

I tried that, but the result I got is "Dict does not support indexing." The type in '_values' in your code is a list of dicts.

John R Over a year ago

Post the format of your data if you need help transforming it to fit the psycopg2 function's expectations. If it's not a list of dicts that you're iterating through, what is it?

Darkenor Over a year ago

It is a list of dicts. i.e: [{"hi": "guy"}, {"you":"rock"]. I've tried putting the "values" value in as just the raw object and tried to map the Json function from extra to it, wrap the whole list in Json, and do [Json(incs)].

Darkenor Over a year ago

Sorry but your updated answer still doesn't work. I'll update my question.

John R Over a year ago

My answer does work. The rest of your code doesn’t. If you need more help, post the stack trace of your error in another question

|

Collectives™ on Stack Overflow

How can I bulk load JSON into Postgres using psycopg2?

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related