pandas read_sql not reading all rows

Question

I am running the exact same query both through pandas' read_sql and through an external app (DbVisualizer).

DbVisualizer returns 206 rows, while pandas returns 178.

I have tried reading the data from pandas by chucks based on the information provided at How to create a large pandas dataframe from an sql query without running out of memory?, it didn't make a change.

What could be the cause for this and ways to remedy it?

The query:

select *
from rainy_days
where year=’2010’ and day=‘weekend’

The columns contain: date, year, weekday, amount of rain at that day, temperature, geo_location (row per location), wind measurements, amount of rain the day before, etc..

The exact python code (minus connection details) is:

import pandas
from sqlalchemy import create_engine

engine = create_engine(
   'postgresql://user:[email protected]/weatherhist?port=5439',
)

query = """
        select *
        from rainy_days
        where year=’2010’ and day=‘weekend’
        """
df = pandas.read_sql(query, con=engine)

You are using strange quotes (for the year=’2010’), I don't know if that could be a cause, but can you replace them with normal single quotes? (') — joris
– joris, Commented Mar 8, 2016 at 10:58
same issue. I have a table with total 7 rows, pandas.read_sql_table get 7 but pandas.read_sql get 5 rows. — Paul Yin
– Paul Yin, Commented Mar 26, 2021 at 6:20

kztd · Accepted Answer · 2022-06-09 12:48:07Z

-1

It's not a fix, but what worked for me was to rebuild the indices:

drop the indices
export the whole thing to a csv:
delete all the rows:

DELETE FROM table
import the csv back in
rebuild the indices

pandas:

df = read_csv(..)
df.to_sql(..)

If that works, then at least you know you have a problem somewhere with the indices keeping up to date.

edited Jun 9, 2022 at 12:48

answered Feb 28, 2017 at 3:14

kztd

3,4232 gold badges24 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

kztd Over a year ago

the strange quotes `` are used in SQL to distinguish field names from reserved words, e.g. SELECT `right` FROM ...

Paul Yin · Accepted Answer · 2021-03-26 07:04:37Z

-2

https://github.com/xzkostyan/clickhouse-sqlalchemy/issues/14

If you use pure engine.execute you should care about format manually

answered Mar 26, 2021 at 7:04

Paul Yin

1,7692 gold badges13 silver badges20 bronze badges

Comments

not2qubit · Accepted Answer · 2022-05-06 09:27:46Z

-3

The problem is that pandas returns a packed dataframe (DF). For some reason this is always on by default and the results varies widely as to what is shown. The solution is to use the unpacking operator (*) before/when trying to print the df, like this:

print(*df)

(This is also know as the splat operator for Ruby enthusiasts.)

To read more about this, please check out these references & tutorials:

answered May 6, 2022 at 9:27

not2qubit

17.7k10 gold badges120 silver badges165 bronze badges

1 Comment

not2qubit Feb 8 at 22:25

If you down-vote, at least have the courtesy of adding a comment. This worked for me and was the working solution for me, at the time!

Collectives™ on Stack Overflow

pandas read_sql not reading all rows

3 Answers 3

1 Comment

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related