I am quite new to python.
I am trying to automate some data analysis of building energy consumption data using python.
I am using python 2.7.3, pandas 0.12, Canopy with qtconsole
These are the steps I am following:
- Paste the data from my simulation software in excel
- Export to csv from Excel
- Import the csv in a pandas
dataframe - Perform my analysis
In the interactive console I write the following code
import pandas as pd
rooms = pd.read_csv('IES Results - Rooms.csv', index_col='Room # (Real)')
systems = pd.read_csv('IES Results - Systems.csv',index_col='Room #')
all_values = pd.concat([rooms,systems],axis=1)
all_values = all_values.T.drop_duplicates().T
columns = [u'Room ID',u'Room Name',u'Floor Area (m²) (Real)',u'Volume (m³) (Real)']
selected_values = all_values[columns]
Unfortunately I get the following error
KeyError: "[u'Floor Area (m\\xb2) (Real)' u'Volume (m\\xb3) (Real)'] not in index"
As you can see all the columns with a superscript are not interpreted correctly and they cannot be found in the dataframe.
When I write
all_values.columns
The columns headers are displayed correctly in the IPython console. I then copy and paste the values I am interested in to create the 'columns' list to pass to 'selected_values = all_values[columns]'
I have done quite a bit of research, but I cannot get my head around it.
I have tried to specify various encoding but I am not really understanding what it is happening.
I have been stuck for more than a day.
Can you please help?
encodingkeyword argument topd.read_csvpandas.pydata.org/pandas-docs/stable/generated/…