2

I'm requests.get() to get some json. After that, I want to insert the data into postgresql. Something very interesting is happening, if I use the df.to_sql(index=False), the data gets appended into postgresql with no problem, but the Id in postgresql is not creating the autoincrement value; the column is totally empty. If I eliminate the parameter in df.to_sql() then I get the following error... IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint. Here is my code...

import requests
import pandas as pd
import sqlalchemy

urls = ['https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-08%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-08%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-08%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=']
df_list = []
for url in urls:
    data = requests.get(url)
    data_json = data.json()
    df = pd.DataFrame(data_json['query']['results']['quote'])
    df_list.append(df)


quote_df = pd.concat(df_list)
engine = sqlalchemy.create_engine('postgresql://postgres:wpc,.2016@localhost:5432/stocks')
quote_df.to_sql('quotes', engine, if_exists='append')

I would like to insert the df into postgresql with the postgresql autoincrement index. How can I fix my code to do so.

Question Update 10NOV2016 1900

I add the following code to fix the indexing in the data frame...

quote_df = pd.concat(df_list)
quote_df.index.name = 'Index'
quote_df = quote_df.reset_index()
quote_df['Index'] = quote_df.index

engine = create_engine('postgresql://postgres:wpc,.2016@localhost:5432/stocks')

quote_df.to_sql('quotes', engine, if_exists = 'append', index=False) engine.dispose()

Now I'm having the following error when appending to postgresql...

ProgrammingError: (psycopg2.ProgrammingError) column "Index" of relation "quotes" does not exist LINE 1: INSERT INTO quotes ("Index", "Adj_Close", "Close", "Date", "... 

The column does exists in the database.

2
  • can you post create table ... statement for your PostgreSQL table? Commented Nov 10, 2016 at 7:16
  • I create the table in pgadmin4. The column names are the same as the json object. Commented Nov 10, 2016 at 19:56

2 Answers 2

0

One way (among many) to do this would be:

to fetch maximum Id and store it to a variable (let's call it max_id):

select max(Id) from quotes;

now we can do this:

Original DF:

In [55]: quote_df
Out[55]:
      Adj_Close       Close        Date        High         Low        Open Symbol   Volume
0    170.572764  173.990005  2015-12-31  175.649994  173.970001  175.089996    DIA  5773400
1    172.347213  175.800003  2015-12-30  176.720001  175.619995  176.570007    DIA  2910000
2     173.50403  176.979996  2015-12-29      177.25      176.00  176.190002    DIA  6145700
..          ...         ...         ...         ...         ...         ...    ...      ...
213   88.252244   89.480003  2016-01-06   90.099998   89.080002   89.279999    IWN  1570400
214   89.297697   90.540001  2016-01-05   90.620003       89.75   90.410004    IWN  2053100
215   88.893319   90.129997  2016-01-04   90.730003   89.360001   90.550003    IWN  2540600

[1404 rows x 8 columns]

now we can increase index by max_id:

In [56]: max_id = 123456    # <-- you don't need this line... 

In [57]: quote_df.index += max_id

and set index as Id column:

In [58]: quote_df.reset_index().rename(columns={'index':'Id'})
Out[58]:
          Id   Adj_Close       Close        Date        High         Low        Open Symbol   Volume
0     123456  170.572764  173.990005  2015-12-31  175.649994  173.970001  175.089996    DIA  5773400
1     123457  172.347213  175.800003  2015-12-30  176.720001  175.619995  176.570007    DIA  2910000
2     123458   173.50403  176.979996  2015-12-29      177.25      176.00  176.190002    DIA  6145700
...      ...         ...         ...         ...         ...         ...         ...    ...      ...
1401  123669   88.252244   89.480003  2016-01-06   90.099998   89.080002   89.279999    IWN  1570400
1402  123670   89.297697   90.540001  2016-01-05   90.620003       89.75   90.410004    IWN  2053100
1403  123671   88.893319   90.129997  2016-01-04   90.730003   89.360001   90.550003    IWN  2540600

[1404 rows x 9 columns]

Now it should be possible to write this DF to PostgreSQL specifying (index=False)

Sign up to request clarification or add additional context in comments.

6 Comments

I can't fetch max(Id) because at this moment the table is empty; no rows whatsoever. This dataframe will be the first data that I'm going to insert in postgresql.
@Gilbert, what does print(max_id) after you fetched it from the empty table?
Jupyter is returning an error with select max(Id) from quotes. Is this the complete line of code or this is sudo code that I have to include in a sqlalchamy statement?
@Gilbert, that's a SQL that must be executed and fetched using SQLAlchemy or Psycopg2 ...
max_id is returning a sqlalchemy object; <sqlalchemy.engine.result.ResultProxy object at #######> I'm suppouse is returning that object because the table is empty.
|
0

Answer to my own question 11NOV2016 1112

I figure out that after df.reset_index(), I can delete the extra column pandas create and the original index column stays reset. Now if I execute the code without index=False, sqlalchemy will insert the index into postgres. Here is the code that solve my problem...

import requests
import pandas as pd
from sqlalchemy import create_engine

urls = ['https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22DIA%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-11%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22SPY%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-11%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222015-01-01%22%20and%20endDate%20%3D%20%222015-12-31%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=',
    'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.historicaldata%20where%20symbol%20%3D%20%22IWN%22%20and%20startDate%20%3D%20%222016-01-01%22%20and%20endDate%20%3D%20%222016-11-11%22&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=']

df_list = []
for url in urls:
    data = requests.get(url)
    data_json = data.json()
    df = pd.DataFrame(data_json['query']['results']['quote'])
    df_list.append(df)

quote_df = pd.concat(df_list)
quote_df = quote_df.reset_index()
quote_df = quote_df.drop('index', 1)


engine = create_engine('postgresql://postgres:wpc,.2016@localhost:5432/stocks')
quote_df.to_sql('quotes', engine, if_exists='append')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.