SQLAlchemy read_sql() into Pandas dataframe - large column value gets truncated

Question

I am trying to read data from a MySQL table, and one of the columns contains large varchar values e.g. of length 49085. When I read the results of the query into a dataframe, the column value is truncated at 87 characters. Please see the code below and the output. Does anyone know how I can read the entire string without truncation?

In the code below table test contains a column description where one of the rows has a string of length 49085.

Code:

import sys
import os
from sqlalchemy import create_engine
import pandas as pd

db_connection_str = 'mysql+pymysql://username:password@host/db_name'
db_connection = create_engine(db_connection_str)

#this returns 1 row where the value in the description field is of length 49085
df = pd.read_sql("select id, description, length(description) as len from myTable where length(description) = 49085", con=db_connection)

#this returns the truncated value of length 87
print(df)
len(str(df['description']))

Output:

   id                                             description    len
0  1  This document is for the testing Team.\n\nThe attach...  49085
87

I haven't, don't know much about that. Do you mean trying something other than sqlalchemy? — Chipmunk_da
– Chipmunk_da, Commented Oct 19, 2021 at 16:20

Gord Thompson · Accepted Answer · 2021-10-19 16:40:54Z

1

You are being misled by len(str(df['description'])). df['description'] returns a <class 'pandas.core.series.Series'> object and if we call str() on it we get

'0    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...\nName: description, dtype: object'

The length of that string will be 87 for any arbitrarily large string in the Series. To test the actual length of the string, use

print(len(df['description'][0]))

or similar.

answered Oct 19, 2021 at 16:40

Gord Thompson

125k38 gold badges251 silver badges458 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Chipmunk_da Over a year ago

Thanks! that's helpful to know. I tried print(len(df['description'][0])) and it does show the correct length of 49085. But then when I write the df to a .txt file, I still get the truncated value. Below is the code I'm using to write it to a .txt file.

Chipmunk_da Over a year ago

writePath = r'sample_data.txt' with open(writePath, 'a') as f:     dfAsString = df.to_string(index=False)     f.writelines(dfAsString)

Gord Thompson Over a year ago

If you're looking to dump the DataFrame to a text file you might have better luck with something like df.to_csv()

Chipmunk_da Over a year ago

Correct me if I'm wrong, but wouldn't that have the same issue? Coz to open the csv I'd need to do that in Excel and the max length of a cell in Excel is 30k characters.

Gord Thompson Over a year ago

Many applications other than Excel can consume CSV files. What exactly do you intend to do with that text file?

|

Collectives™ on Stack Overflow

SQLAlchemy read_sql() into Pandas dataframe - large column value gets truncated

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related