0

I try to query some data from a postgres database and add the results into an excel with the below Python code (I am connecting to the server through ssh tunnel and connecting to database using sqlalchemy):

from sshtunnel import SSHTunnelForwarder
from sqlalchemy.orm import sessionmaker 
from sqlalchemy import create_engine
import pandas as pd
from pandas import DataFrame
import xlsxwriter
import openpyxl

with SSHTunnelForwarder(
    ('<server_ip>', 22),
    ssh_username="<server_username>",
    ssh_private_key='<private_key_path>', 
    remote_bind_address=('localhost', 5432)) as server:
    server.start()
    print "server connected"

    #connect to DB
    local_port = str(server.local_bind_port)
    engine = create_engine('postgresql://<db_username>:<db_password>:' + local_port +'/<db_name>')
    Session = sessionmaker(bind=engine)
    s = Session()
    print 'Database session created'

    not_empty_query = False #flag empty queries
    arg_query = "SELECT * from portalpage where id not in (select entityid from sharepermissions where entitytype='PortalPage')"
    query = s.execute(arg_query)
    print(query)
    for row in query: #check if the query is empty
        if (row[0] > 0):
            not_empty_query = True
            break
    if not_empty_query == True: #if the query isn not empty add response into excel
        df = pd.DataFrame(pd.np.empty((0, 8)))
        df = DataFrame(query.fetchall())
        print(df)
        df.columns = query.keys()
        df.to_excel("out.xlsx", engine="openpyxl", sheet_name="Worksheet_Name")

s.close()

It works for the most of the queries that I tried to execute, however with the above query it returns the below error:

ValueError: Length mismatch: Expected axis has 0 elements, new values have 8 elements

While I was troubleshooting I printed the the df parameter and I got an "Empty Dataframe". However when I run the same query in my database directly I get results.

I also noticed that in the response, on my database, some columns are empty (not sure if it makes any difference).

Please also find a print screen of the code execution. enter image description here

The above will work if I remove the below piece of code:

for row in query: #check if the query is empty
    if (row[0] > 0):
        not_empty_query = True
        break
if not_empty_query == True:

However, if I remove this 'for loop' then for other queries (mainly for queries which return empty results) I get the same error. Please find an example below. enter image description here

Ay ideas?

13
  • use @Spyros_av idea. get all the data into the dataframe first and then push to excel Commented Feb 18, 2020 at 14:58
  • As I replied below even if I define an empty dataframe I still get the same error. Commented Feb 25, 2020 at 11:46
  • can you share the updated script? so we can see why it doesn't work Commented Feb 25, 2020 at 12:06
  • I want to be sure that you used pd.DataFrame(pd.np.empty((0, 8))) to create an empty dataframe to house the query results. this is what @Spyros_av suggested. Commented Feb 25, 2020 at 12:24
  • 1
    remove this part from your script for row in query: #check if the query is empty if (row[0] > 0): not_empty_query = True break Commented Feb 25, 2020 at 17:27

2 Answers 2

1

Please try this. I found that the logic you are using to check if the query returns any data is the problem. I have modified it to have that check first. If there is any row returned then it builds the dataframe and then exports to excel. Please let me know if it works.

from sshtunnel import SSHTunnelForwarder
from sqlalchemy.orm import sessionmaker 
from sqlalchemy import create_engine
import pandas as pd
from pandas import DataFrame
import xlsxwriter
import openpyxl

with SSHTunnelForwarder(
    ('<server_ip>', 22),
    ssh_username="<server_username>",
    ssh_private_key='<private_key_path>', 
    remote_bind_address=('localhost', 5432)) as server:
    server.start()
    print "server connected"

    #connect to DB
    local_port = str(server.local_bind_port)
    engine = create_engine('postgresql://<db_username>:<db_password>:' + local_port +'/<db_name>')
    Session = sessionmaker(bind=engine)
    s = Session()
    print 'Database session created'
    arg_query = "SELECT * from portalpage where id not in (select entityid from sharepermissions where entitytype='PortalPage')"
    query = conn.execute(arg_query)##rows_count
    rows = query.fetchall()
    columns=query.keys()
    if len(rows) > 0:
        df = DataFrame(rows)
        df.columns =columns
        df.to_excel("out.xlsx", engine="openpyxl", sheet_name="Worksheet_Name")
    else:
        print "no data"
Sign up to request clarification or add additional context in comments.

3 Comments

I get the same error, also I noticed that it goes into the if clause even if the query doesn't return any results and if I print the query variable before the if clause it returns this: <sqlalchemy.engine.result.ResultProxy object at 0x1207a2a90>
I think I found what it is. I have modified the script: the query.fetchall() and query.keys() needs to store their values before the loop. Please check again.
Yeah, that seems to be working perfectly. Thank you @MEdwin.
0

Try to create an empty data frame first.

if not_empty_query == True: #if the query isn not empty add response into excel
        df = pd.DataFrame(pd.np.empty((0, 8)))   
        df = DataFrame(query.fetchall())
        print(df)
        df.columns = query.keys()
        df.to_excel("out.xlsx", engine="openpyxl", sheet_name="Worksheet_Name")

1 Comment

Hi, thanks for the reply. I still get the same error. Also ideally I want to have a dynamic size for my dataframe as I will use it in a function and I will run multiple queries.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.