I am very new to python and I have an issue with the read_sql_table part of pandas. If I simply provide the table name and the engine from sqlalchemy, it reads the data and I am able to print the head of the dataframe. If I add index_col and columns, it works as well. As soon as I add CHUNKSIZE as 10000, it fails to print the head with the error 'generator' object has no attribute 'head'
2 Answers
A generator in Python is a way to lazily evaluate things.
So there simply isn't anything to get the .head() of when you provide input to the chunksize keyword argument.
What you'll need to become familiar with is iterating over those results.
Example:
generator_object = pd.read_sql_table('your_table',con=your_connection_string,
chunksize=CHUNKSIZE)
for chunk in generator_object:
print(chunk)
Another thing you can do is to request the first chunk of your table with next():
generator_object = pd.read_sql_table('your_table',con=your_connection_string,
chunksize=CHUNKSIZE)
next(generator_object).head()
But please note that this consumes the chunk, and generator_object will no longer return that chunk.
Further reading:
You can also get multiple chunks using itertools.islice:
import itertools as it
CHUNKSIZE = 10
iterable_slice = it.islice(generator_object,3) # get 3*10 == 30 records
for chunk in iterable_slice:
print(chunk)