1

Suppose I have the following table in redshift:

a | b
-----
1 | 2
3 | 4

If I want to extract it from Redshift to a pd.DataFrame I can do the following:

import redshift_connector
import pandas as pd

query = 'SELECT * FROM table'
conn = redshift_connector(user=user, host=host, password=password, port=port, database=database)

df = pd.read_sql_query(query, conn)

I'm using the following package redshift_connector. But the problem is that the name of the columns in df are byte-strings:

df['a']

This would return an error, since the name of the column is b'a'. Does anyone know any workaround for this? I already have written code using psycopg2 which uses normal strings, and thus would like have a solution that doesn't change too much of the code.

Edit:

Versions

Python = 3.9.7

Redshift-connector = 2.0.889

Pandas = 1.2.5

2 Answers 2

5

You could just fix this with one line

df.columns = [col.decode("utf-8") for col in df.columns]

Or instead of using pd.read_sql_query use the connection approach suggested in the documentation

cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("SELECT * FROM table")

result: pd.DataFrame = cursor.fetch_dataframe()
Sign up to request clarification or add additional context in comments.

Comments

2

This was fixed in v2.0.908 of redshift-connector

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.