1

I have a file with 3'502'379 rows and 3 columns. The following script is supposed to be executed but raises and error in the date handling line:

import matplotlib.pyplot as plt
import numpy as np
import csv
import pandas

path = 'data_prices.csv'
data = pandas.read_csv(path, sep=';')
data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')

This is the error that occurs:

Traceback (most recent call last):
  File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1945, in get_loc
    return self._engine.get_loc(key)
  File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
  File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
  File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
  File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'DATE'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\data\script.py", line 15, in <module>
    data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')
  File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 1997, in __getitem__
    return self._getitem_column(key)
  File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 2004, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Program Files\Python35\lib\site-packages\pandas\core\generic.py", line 1350, in _get_item_cache
    values = self._data.get(item)
  File "C:\Program Files\Python35\lib\site-packages\pandas\core\internals.py", line 3290, in get
    loc = self.items.get_loc(item)
  File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1947, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
  File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
  File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
  File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'DATE'
6
  • 1
    Still problem if use data = pandas.read_csv(path, sep=';', nrows=10000) ? Commented Aug 9, 2016 at 9:14
  • 3
    your data df doesn't have DATE column. Please post the output of data.columns.tolist() Commented Aug 9, 2016 at 9:15
  • 1
    it's a UTF BOM signature Commented Aug 9, 2016 at 9:22
  • 1
    @Spurious this is worth a read stackoverflow.com/questions/21806496/… Commented Aug 9, 2016 at 9:22
  • 2
    You have utf-16 Big endian BOM, see related: stackoverflow.com/questions/38774705/… try: data = pandas.read_csv(path, sep=';', encoding='utf-16') Commented Aug 9, 2016 at 9:22

1 Answer 1

4

the '\ufeffDATE' in the first column name shows that your CSV file has a UTF-16 Byte Order Mark (BOM) signature so it must be read accordingly.

so try this when reading your CSV:

df = pd.read_csv(path, sep=';', encoding='utf-8-sig')

or as @EdChum suggested:

df = pd.read_csv(path, sep=';', encoding='utf-16')

both variants should work properly

PS this answer shows how to deal with BOMs

Sign up to request clarification or add additional context in comments.

2 Comments

Wrong, this is utf-16 big endian: en.wikipedia.org/wiki/… FE FF is utf-16 Big endian
@EdChum, i've corrested my answer. Actually both variants will work properly

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.