1

Why do I keep getting a key error?

[edit] Here is the data:

GEO,LAT,LON
AALBORG   DENMARK,57.0482206,9.9193939
AARHUS   DENMARK,56.1496278,10.2134046
ABBOTSFORD BC  CANADA,49.0519047,-122.3290473
ABEOKUTA   NIGERIA,7.161,3.348
ABERDEEN   SCOTLAND,57.1452452,-2.0913745

[end edit] Can't find row by index, but its clearly there:

geocache = pd.read_csv('geolog.csv',index_col=['GEO']) # index_col=['GEO']
geocache.head()

Shows

                      LAT           LON
GEO     
AALBORG DENMARK       57.048221     9.919394
AARHUS DENMARK        56.149628    10.213405
ABBOTSFORD BC CANADA  49.051905  -122.329047
ABEOKUTA NIGERIA       7.161000     3.348000
ABERDEEN SCOTLAND     57.145245    -2.091374

So then I test it:

x = 'AARHUS DENMARK'
print(x)
geocache[x]

And this is what I get:

AARHUS DENMARK


KeyError Traceback (most recent call last) in () 2 x = u'AARHUS DENMARK' 3 print(x) ----> 4 geocache[x]

C:\Users\g\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   1785             return self._getitem_multilevel(key)
   1786         else:
-> 1787             return self._getitem_column(key)
   1788 
   1789     def _getitem_column(self, key):

C:\Users\g\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   1792         # get column
   1793         if self.columns.is_unique:
-> 1794             return self._get_item_cache(key)
   1795 
   1796         # duplicate columns & possible reduce dimensionaility

C:\Users\g\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1077         res = cache.get(item)
   1078         if res is None:
-> 1079             values = self._data.get(item)
   1080             res = self._box_item_values(item, values)
   1081             cache[item] = res

C:\Users\g\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   2841 
   2842             if not isnull(item):
-> 2843                 loc = self.items.get_loc(item)
   2844             else:
   2845                 indexer = np.arange(len(self.items))[isnull(self.items)]

C:\Users\g\Anaconda3\lib\site-packages\pandas\core\index.py in get_loc(self, key, method)
   1435         """
   1436         if method is None:
-> 1437             return self._engine.get_loc(_values_from_object(key))
   1438 
   1439         indexer = self.get_indexer([key], method=method)

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3824)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3704)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12349)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12300)()

KeyError: 'AARHUS DENMARK'

No extra spaces or non-visible chars, Tried putting r and u before the string assignment with no change in behavior.

Ok, what am I missing?

9
  • geocache.loc(x) results in "<pandas.core.indexing._LocIndexer at 0x88a8588>" Commented Apr 12, 2015 at 17:02
  • 1
    Can you show geocache.index.values? Commented Apr 12, 2015 at 17:06
  • @joris array(['AALBORG DENMARK', 'AARHUS DENMARK', 'ABBOTSFORD BC CANADA', 'ABEOKUTA NIGERIA', ... italic Interesting.. it shows 3 spaces between words... and that works. How the heck did those spaces appear? Commented Apr 12, 2015 at 17:10
  • 1
    You didn't pass a separator arg to read_csv so the default is comma separated, however if your csv file has spaces between the commas then these will be parsed as part of the data, please post some raw input from your csv, you could also try pd.read_csv('geolog.csv',index_col=['GEO'], sep=',\s+', engine='python') Commented Apr 12, 2015 at 17:41
  • @EdChum AH! Once again Ed, you are a lifesaver. Commented Apr 12, 2015 at 17:50

1 Answer 1

1

As you didn't pass a sep (separator) arg to read_csv the default is comma separated. As your csv contained spaces/tabs after the commas then these get treated as part of the data hence your index data contains embedded spaces.

So you need to pass additional params to read_csv:

pd.read_csv('geolog.csv',index_col=['GEO'], sep=',\s+', engine='python')

The sep arg means that it will look for commas with optional 1 or more spaces in front of the commas, we pass engine='python' as the c engine does not accept a regex for separators.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.