Insert dataframe into postgresql sqlalchemy with idx autoincrement

Question

I'm requests.get() to get some json. After that, I want to insert the data into postgresql. Something very interesting is happening, if I use the df.to_sql(index=False), the data gets appended into postgresql with no problem, but the Id in postgresql is not creating the autoincrement value; the column is totally empty. If I eliminate the parameter in df.to_sql() then I get the following error... IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint. Here is my code...

import requests
import pandas as pd
import sqlalchemy

urls = ['https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-08%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-08%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-08%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=']
df_list = []
for url in urls:
    data = requests.get(url)
    data_json = data.json()
    df = pd.DataFrame(data_json['query']['results']['quote'])
    df_list.append(df)


quote_df = pd.concat(df_list)
engine = sqlalchemy.create_engine('postgresql://postgres:wpc,.2016@localhost:5432/stocks')
quote_df.to_sql('quotes', engine, if_exists='append')

I would like to insert the df into postgresql with the postgresql autoincrement index. How can I fix my code to do so.

Question Update 10NOV2016 1900

I add the following code to fix the indexing in the data frame...

quote_df = pd.concat(df_list)
quote_df.index.name = 'Index'
quote_df = quote_df.reset_index()
quote_df['Index'] = quote_df.index

engine = create_engine('postgresql://postgres:wpc,.2016@localhost:5432/stocks')

quote_df.to_sql('quotes', engine, if_exists = 'append', index=False) engine.dispose()

Now I'm having the following error when appending to postgresql...

ProgrammingError: (psycopg2.ProgrammingError) column "Index" of relation "quotes" does not exist LINE 1: INSERT INTO quotes ("Index", "Adj_Close", "Close", "Date", "...

The column does exists in the database.

can you post create table ... statement for your PostgreSQL table? — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Nov 10, 2016 at 7:16
I create the table in pgadmin4. The column names are the same as the json object. — redeemefy
– redeemefy, Commented Nov 10, 2016 at 19:56

MaxU - stand with Ukraine · Accepted Answer · 2016-11-10 07:36:09Z

0

One way (among many) to do this would be:

to fetch maximum Id and store it to a variable (let's call it max_id):

select max(Id) from quotes;

now we can do this:

Original DF:

In [55]: quote_df
Out[55]:
      Adj_Close       Close        Date        High         Low        Open Symbol   Volume
0    170.572764  173.990005  2015-12-31  175.649994  173.970001  175.089996    DIA  5773400
1    172.347213  175.800003  2015-12-30  176.720001  175.619995  176.570007    DIA  2910000
2     173.50403  176.979996  2015-12-29      177.25      176.00  176.190002    DIA  6145700
..          ...         ...         ...         ...         ...         ...    ...      ...
213   88.252244   89.480003  2016-01-06   90.099998   89.080002   89.279999    IWN  1570400
214   89.297697   90.540001  2016-01-05   90.620003       89.75   90.410004    IWN  2053100
215   88.893319   90.129997  2016-01-04   90.730003   89.360001   90.550003    IWN  2540600

[1404 rows x 8 columns]

now we can increase index by max_id:

In [56]: max_id = 123456    # <-- you don't need this line... 

In [57]: quote_df.index += max_id

and set index as Id column:

In [58]: quote_df.reset_index().rename(columns={'index':'Id'})
Out[58]:
          Id   Adj_Close       Close        Date        High         Low        Open Symbol   Volume
0     123456  170.572764  173.990005  2015-12-31  175.649994  173.970001  175.089996    DIA  5773400
1     123457  172.347213  175.800003  2015-12-30  176.720001  175.619995  176.570007    DIA  2910000
2     123458   173.50403  176.979996  2015-12-29      177.25      176.00  176.190002    DIA  6145700
...      ...         ...         ...         ...         ...         ...         ...    ...      ...
1401  123669   88.252244   89.480003  2016-01-06   90.099998   89.080002   89.279999    IWN  1570400
1402  123670   89.297697   90.540001  2016-01-05   90.620003       89.75   90.410004    IWN  2053100
1403  123671   88.893319   90.129997  2016-01-04   90.730003   89.360001   90.550003    IWN  2540600

[1404 rows x 9 columns]

Now it should be possible to write this DF to PostgreSQL specifying (index=False)

answered Nov 10, 2016 at 7:36

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

redeemefy Over a year ago

I can't fetch max(Id) because at this moment the table is empty; no rows whatsoever. This dataframe will be the first data that I'm going to insert in postgresql.

MaxU - stand with Ukraine Over a year ago

@Gilbert, what does print(max_id) after you fetched it from the empty table?

redeemefy Over a year ago

Jupyter is returning an error with select max(Id) from quotes. Is this the complete line of code or this is sudo code that I have to include in a sqlalchamy statement?

MaxU - stand with Ukraine Over a year ago

@Gilbert, that's a SQL that must be executed and fetched using SQLAlchemy or Psycopg2 ...

redeemefy Over a year ago

max_id is returning a sqlalchemy object; <sqlalchemy.engine.result.ResultProxy object at #######> I'm suppouse is returning that object because the table is empty.

|

redeemefy · Accepted Answer · 2016-11-11 17:11:18Z

Answer to my own question 11NOV2016 1112

I figure out that after df.reset_index(), I can delete the extra column pandas create and the original index column stays reset. Now if I execute the code without index=False, sqlalchemy will insert the index into postgres. Here is the code that solve my problem...

import requests
import pandas as pd
from sqlalchemy import create_engine

urls = ['https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-11%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-11%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-11%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=']

df_list = []
for url in urls:
    data = requests.get(url)
    data_json = data.json()
    df = pd.DataFrame(data_json['query']['results']['quote'])
    df_list.append(df)

quote_df = pd.concat(df_list)
quote_df = quote_df.reset_index()
quote_df = quote_df.drop('index', 1)


engine = create_engine('postgresql://postgres:wpc,.2016@localhost:5432/stocks')
quote_df.to_sql('quotes', engine, if_exists='append')

Collectives™ on Stack Overflow

Insert dataframe into postgresql sqlalchemy with idx autoincrement

Question Update 10NOV2016 1900

2 Answers 2

6 Comments

Answer to my own question 11NOV2016 1112

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Question Update 10NOV2016 1900

2 Answers 2

6 Comments

Answer to my own question 11NOV2016 1112

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related