Populating a postgres table from a fixed width file using psycopg2 and pandas [duplicate]

Ask Question

Asked 6 years, 5 months ago

Modified 6 years, 5 months ago

Viewed 81 times

I'm trying to read fixed width files and bulk load data to various tables on a postgres server. One file per one table of course.

I've wrote a function that creates the table for me already. I just need to populate it.

After some reading the docs and other research I've found two useful commands: "copy_from" within psycopg2 and ".to_sql" from pandas. I successfully created an implementation that uses the latter. The table looks great. The only issue is, it takes about 14 minutes to upload a file that is 110,000 rows and 100 columns.

The former approach is apparently a much faster method. I just cannot make it work yet.

Below is the code I have thus far:

import pandas as pd
import psycopg2 as pg

def sql_gen(cur,conn,filename,widths,col_names,table,header=False,chunk_size=10**5):

    df = pd.read_fwf(filename,widths=widths,index_col=False,header=None,iterator=True,
                     chunksize=chunk_size,names=cols)

    for chunk in df:
        cur.copy_from(df,table,null="")
        yield conn.commit()

#parameters
data_path = 'data.dat'
table = 'example_table'

#some stuff to extract stuff we need
widths = getwidths
cols = getcols

#main part of script
conn = None
try:
    conn = pg.connect('connectionstring')
    cursor = conn.cursor()

    for sql in sql_gen(cursor,conn,data_path,widths,cols,table):
        print(sql)

    # close communication with the PostgreSQL database server
    cursor.close()
except (Exception, pg.Error) as error :
    print(error)
finally:
    #closing database conn.
    if conn is not None:
        conn.close()
        print("PostgreSQL conn is closed")

I expected this to work but I am instead getting a TypeError:

argument 1 must have both .read() and .readline() methods

Full traceback as requested:


  File "<ipython-input-10-542d72b61dd4>", line 4, in <module>
    for sql in sql_gen(cursor,conn,data_path,widths,cols,table):

  File "<ipython-input-8-f82fb5831db3>", line 7, in sql_generator
    cur.copy_from(df,'example_table',null="")```

edited Jun 11, 2019 at 20:14

asked Jun 11, 2019 at 19:58

DudeWah

3711 gold badge3 silver badges21 bronze badges

can you post the full traceback please

gold_cy
– gold_cy

2019-06-11 20:04:36 +00:00
Commented Jun 11, 2019 at 20:04
@aws_apprentice that is the extent of it.

DudeWah
– DudeWah

2019-06-11 20:07:41 +00:00
Commented Jun 11, 2019 at 20:07
copy_from expects a separator character such as a comma or tab. That makes it appropriate for CSV files, but not fixed-width data. So I don't think it is a viable alternative for your current problem.

unutbu
– unutbu

2019-06-11 20:10:44 +00:00
Commented Jun 11, 2019 at 20:10
3

This seems similar to stackoverflow.com/q/38204064/895640

Jeremy
– Jeremy

2019-06-11 20:13:42 +00:00
Commented Jun 11, 2019 at 20:13
1

@DudeWah: Jeremy's link shows a way to convert DataFrames into a CSV file-like object which can be fed to copy_from. That's probably the best solution to the problem you've posted here. You could, if you wish, post a separate question about how to speed up your to_sql code.

unutbu
– unutbu

2019-06-11 20:20:09 +00:00
Commented Jun 11, 2019 at 20:20

| Show 4 more comments

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Populating a postgres table from a fixed width file using psycopg2 and pandas [duplicate]

0

Linked

Hot Network Questions