I query 4hrs data from source PLC MS SQL db, process it with python and write the data to main Postgresql table.
While writing to main Postgres table hourly, there is a duplicate value (previous 3 hrs) -it will result in error (primary key) and prevent the transaction & python error.
So,
- I create a temp PostgreSQL table without any key every time hourly
- Then copy pandas dataframe to temp table
- Then insert rows from temp table --> main PostgreSQL table
- Drop temp PostgreSQL table
This python script runs in windows task scheduler hourly
Below is my query.
engine = create_engine('postgresql://postgres:postgres@host:port/dbname?gssencmode=disable')
conn = engine.raw_connection()
cur = conn.cursor()
cur.execute("""CREATE TABLE public.table_temp
(
datetime timestamp without time zone NOT NULL,
tagid text COLLATE pg_catalog."default" NOT NULL,
mc text COLLATE pg_catalog."default" NOT NULL,
value text COLLATE pg_catalog."default",
quality text COLLATE pg_catalog."default"
)
TABLESPACE pg_default;
ALTER TABLE public.table_temp
OWNER to postgres;""");
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, 'table_temp', null="")
cur.execute("""Insert into public.table_main select * From table_temp ON CONFLICT DO NOTHING;""");
cur.execute("""DROP TABLE table_temp CASCADE;""");
conn.commit()
I would like to know if there is any efficient/faster way to do it
to_sql()method, you may not need to export to csv.