I am using pandas to read .csv data files. For one of my files I am able to index using the column title. For the other I get error messages
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py",
line 1023, in _check_have
raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named State'
The code I used is:
filename = "PovertyEstimates.csv"
#filename = "nm.csv"
f = open(filename)
import pandas as pd
data = pd.read_csv(f)#, index_col=0)
print data['State']
Even when I use index_col I get the same error(unless it is 0). I have found that when I print the csv file that isn't working in my terminal it is not separated into columns like the one that is. Rather the items in each row are printed consecutively separated by spaces. I believe this incorrect separation is the problem.
I am using LibreOffice Calc on Ubuntu Linux. For the improperly formatted file (which appears in perfect format in LibreOffice) the terminal output is:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3194 entries, 0 to 3193
Data columns:
FIPStxt State Area_name Rural-urban_Continuum Code_2003 Urban_Influence_Code_2003 Rural-urban_Continuum Code_20013 Urban_Influence_Code_20013 POVALL_2011 CI90LBAll_2011 CI90UBALL_2011 PCTPOVALL_2011 CI90LBALLP_2011 CI90UBALLP_2011 POV017_2011 CI90LB017_2011 CI90UB017_2011 PCTPOV017_2011 CI90LB017P_2011 CI90UB017P_2011 POV517_2011 CI90LB517_2011 CI90UB517_2011 PCTPOV517_2011 CI90LB517P_2011 CI90UB517P_2011 MEDHHINC_2011 CI90LBINC_2011 CI90UBINC_2011 POV05_2011 CI90LB05_2011 CI90UB05_2011 PCTPOV05_2011 CI90LB05P_2011 CI90UB05P_2011 3194 non-null values
dtypes: object(1)
The first few lines of the csv file are:
FIPStxt State Area_name Rural-urban_Continuum Code_2003
01000 AL Alabama
01001 AL Autauga County 2 2
01003 AL Baldwin County 4 5
data = pd.read_csv(f)#, index_col=0)missing a left paren?