I have access to a MS SQL SERVER database from which I retrieve data for analysis. I use a Mac and so can access the database with Navicat Essentials for SQL Server. That works really well. However, I would like to access the database using Python. I have installed a virtual environment for Python 3.4 and have installed various libraries including Numpy, Pandas, Pypyodbc and some others. I configured a DNS connection in ODBC Manager app and I can access a table called 'Category' in the database using Python as follows:
import pandas as pd
import pypyodbc
connectionName = pypyodbc.connect('DNS=myDNSName')
queryName 'SELECT ID, CategoryName FROM Category'
retrievedDataDF = pd.io.sql.read_sql(queryName, con=connectionName)
connectionName.close()
print(retrieveDataDF.head())
print(retrieveDataDF.columns)
This seems to work fine except the column headings in the returned dataframe seem to be represented in some form of binary format, in this case, the column headings in the dataframe are b'i' and b'c'. The outputs from the print functions are:
b'i' b'c'
0 1 missing
1 2 blue
2 3 red
3 4 green
4 5 yellow
Index([b'i', b'c'], dtype='object')
I don't recall having this problem previously and I can't find any reference to similar issues online. As a result, I can't work out what is going on.
Any suggestions would be appreciated.
EDIT: Following comments by Joris, the following may be useful:
connectionName.cursor().execute(queryName).description
[(b'i', int, 11, 10, 10, 0, False), (b'c', str, 100, 100, 100, 0, True)]
Versions of all installed libraries are given below:
From Terminal
$ env/bin/pip list
appnope (0.1.0) decorator (4.0.4) gnureadline (6.3.3) ipykernel (4.1.1) ipython (4.0.0) ipython-genutils (0.1.0) ipywidgets (4.1.1) jdcal (1.0) Jinja2 (2.8) jsonschema (2.5.1) jupyter (1.0.0) jupyter-client (4.1.1) jupyter-console (4.0.3) jupyter-core (4.0.6) MarkupSafe (0.23) matplotlib (1.4.3) mistune (0.7.1) nbconvert (4.0.0) nbformat (4.0.1) nose (1.3.7) notebook (4.0.6) numexpr (2.4.3) numpy (1.10.1) openpyxl (2.2.4) pandas (0.17.0) pandastable (0.4.0) path.py (8.1.2) pexpect (4.0.1) pickleshare (0.5) pip (1.5.6) ptyprocess (0.5) Pygments (2.0.2) pyparsing (2.0.3) pypyodbc (1.3.3) python-dateutil (2.4.2) pytz (2015.6) pyzmq (14.7.0) qtconsole (4.1.0) scipy (0.16.1) setuptools (3.6) simplegeneric (0.8.1) six (1.9.0) terminado (0.5) tornado (4.2.1) traitlets (4.0.0) xlrd (0.9.3)
From within virtual environment
import pandas as pd
pd.show_versions(as_json=False)
INSTALLED VERSIONS
commit: None python: 3.4.1.final.0 python-bits: 64 OS: Darwin OS-release: 15.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8
pandas: 0.17.0 nose: 1.3.7 pip: 1.5.6 setuptools: 3.6 Cython: None numpy: 1.10.1 scipy: 0.16.1 statsmodels: None IPython: 4.0.0 sphinx: None patsy: None dateutil: 2.4.2 pytz: 2015.6 blosc: None bottleneck: None tables: None numexpr: 2.4.3 matplotlib: 1.4.3 openpyxl: 2.2.4 xlrd: 0.9.3 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None
(Since then, I've installed sqlalchemy 1.0.10 but I'm still working on trying to connect using SQLAlchemy.)
EDIT 2
Have failed to connect using sqlalchemy to create engine because I couldn't get pyodbc to install on a Mac running El Capitan (pip install fails with fatal error caused by missing sql.h header file) and sqlalchemy seems to require pyodbc to be installed. Instead, I generally use pypyodbc but sqlalchemy can't use pypyodbc instead of pyodbc. I have, however, successfully connected to the database using the following:
phjConnection = pypyodbc.connect(driver="{Actual SQL Server}",server="myServerName",uid="myUserName",pwd="myPassword",db="myDBName",port="1433")
phjQuery = '''SELECT ID, Catagory_Name FROM Catagory'''
phjLatestData = pd.io.sql.read_sql(phjQuery, con=phjConnection)
Not sure if that achieves the same goal suggested by Joris but the problem still exists , namely:
print(phjLatestData.head())
b'i' b'c'
0 1 missing
1 2 blue
2 3 red
3 4 green
4 5 yellow
retrieveDataDF.head()andretrieveDataDF.columns?connectionName.cursor().execute(queryName).descriptiongive you?connectionNametoread_sql, to create this engine, this will be similar to docs.sqlalchemy.org/en/latest/dialects/…)b'xxx'notu'xxx'.?