Fetching data from postgres database in batch (python)

Question

I have the following Postgres query where I am fetching data from table1 with rows ~25 million and would like to write the output of the below query into multiple files.

query = """ WITH sequence AS (
                SELECT 
                        a,
                        b,
                        c
                FROM table1 )                    

select * from sequence;"""

Below is the python script to fetch the complete dataset. How can I modify the script to fetch it to multiple files (eg. each file has 10000 rows)

#IMPORT LIBRARIES ########################
import psycopg2
from pandas import DataFrame

#CREATE DATABASE CONNECTION ########################
connect_str = "dbname='x' user='x' host='x' " "password='x' port = x"
conn = psycopg2.connect(connect_str)
cur = conn.cursor()
conn.autocommit = True

cur.execute(query)
df = DataFrame(cur.fetchall())

Thanks

user1123335 · Accepted Answer · 2020-02-11 17:58:42Z

10

Here are 3 methods that may help

use psycopg2 named cursor cursor.itersize = 2000

snippet

 with conn.cursor(name='fetch_large_result') as cursor:

    cursor.itersize = 20000

    query = "SELECT * FROM ..."
    cursor.execute(query)

    for row in cursor:
....

use psycopg2 named cursor fetchmany(size=2000)

snippet

conn = psycopg2.connect(conn_url)
cursor = conn.cursor(name='fetch_large_result')
cursor.execute('SELECT * FROM <large_table>')

while True:
    # consume result over a series of iterations
    # with each iteration fetching 2000 records
    records = cursor.fetchmany(size=2000)

    if not records:
        break

    for r in records:
        ....

cursor.close() #  cleanup
conn.close()

Finally you could define the a SCROLL CURSOR

Define a SCROLL CURSOR

snippet

BEGIN MY_WORK;
-- Set up a cursor:
DECLARE scroll_cursor_bd SCROLL CURSOR FOR SELECT * FROM My_Table;

-- Fetch the first 5 rows in the cursor scroll_cursor_bd:

FETCH FORWARD 5 FROM scroll_cursor_bd;
CLOSE scroll_cursor_bd;
COMMIT MY_WORK;

Please note Not naming the cursor in psycopg2 will cause the cursor to be client side as opposed to server side.

answered Feb 11, 2020 at 17:58

user1123335

Sign up to request clarification or add additional context in comments.

3 Comments

rshar Over a year ago

The below code is creating a single csv with 2000 rows, but how can I create multiple csv files for every 2000 rows till the end of the table.

cursor = conn.cursor(name='fetch_big_result') cursor.execute(query)  while True:     # consume result over a series of iterations     # with each iteration fetching 2000 records     records = cursor.fetchmany(size=2000)      if not records:         break      for r in records:         with open('test.csv', 'wt') as f:             csv_writer = csv.writer(f)             csv_writer.writerow(r)  cursor.close() #  cleanup conn.close()

user1123335 Over a year ago

Hi rshar Does psycopg2 copy_expert() help you? stackoverflow.com/a/22789702/1123335

RSHAP Over a year ago

in the first example (for row in cursor) what is row? is it a single row or a batch of many rows?

Collectives™ on Stack Overflow

Fetching data from postgres database in batch (python)

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related