1

I have the following Postgres query where I am fetching data from table1 with rows ~25 million and would like to write the output of the below query into multiple files.

query = """ WITH sequence AS (
                SELECT 
                        a,
                        b,
                        c
                FROM table1 )                    

select * from sequence;"""

Below is the python script to fetch the complete dataset. How can I modify the script to fetch it to multiple files (eg. each file has 10000 rows)

#IMPORT LIBRARIES ########################
import psycopg2
from pandas import DataFrame

#CREATE DATABASE CONNECTION ########################
connect_str = "dbname='x' user='x' host='x' " "password='x' port = x"
conn = psycopg2.connect(connect_str)
cur = conn.cursor()
conn.autocommit = True

cur.execute(query)
df = DataFrame(cur.fetchall())

Thanks

1 Answer 1

10

Here are 3 methods that may help

  1. use psycopg2 named cursor cursor.itersize = 2000

snippet

 with conn.cursor(name='fetch_large_result') as cursor:

    cursor.itersize = 20000

    query = "SELECT * FROM ..."
    cursor.execute(query)

    for row in cursor:
....
  1. use psycopg2 named cursor fetchmany(size=2000)

snippet

conn = psycopg2.connect(conn_url)
cursor = conn.cursor(name='fetch_large_result')
cursor.execute('SELECT * FROM <large_table>')

while True:
    # consume result over a series of iterations
    # with each iteration fetching 2000 records
    records = cursor.fetchmany(size=2000)

    if not records:
        break

    for r in records:
        ....

cursor.close() #  cleanup
conn.close()

Finally you could define the a SCROLL CURSOR

  1. Define a SCROLL CURSOR

snippet

BEGIN MY_WORK;
-- Set up a cursor:
DECLARE scroll_cursor_bd SCROLL CURSOR FOR SELECT * FROM My_Table;

-- Fetch the first 5 rows in the cursor scroll_cursor_bd:

FETCH FORWARD 5 FROM scroll_cursor_bd;
CLOSE scroll_cursor_bd;
COMMIT MY_WORK;

Please note Not naming the cursor in psycopg2 will cause the cursor to be client side as opposed to server side.

Sign up to request clarification or add additional context in comments.

3 Comments

The below code is creating a single csv with 2000 rows, but how can I create multiple csv files for every 2000 rows till the end of the table. cursor = conn.cursor(name='fetch_big_result') cursor.execute(query) while True: # consume result over a series of iterations # with each iteration fetching 2000 records records = cursor.fetchmany(size=2000) if not records: break for r in records: with open('test.csv', 'wt') as f: csv_writer = csv.writer(f) csv_writer.writerow(r) cursor.close() # cleanup conn.close()
Hi rshar Does psycopg2 copy_expert() help you? stackoverflow.com/a/22789702/1123335
in the first example (for row in cursor) what is row? is it a single row or a batch of many rows?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.