3

I'm Downloading stock prices from Yahoo for the S&P500, which has volume too big for a 32-bit integer.

def yahoo_prices(ticker, start_date=None, end_date=None, data='d'):

    csv = yahoo_historical_data(ticker, start_date, end_date, data)

    d = [('date',      np.datetime64),
         ('open',      np.float64),
         ('high',      np.float64),
         ('low',       np.float64),
         ('close',     np.float64),
         ('volume',    np.int64),
         ('adj_close', np.float64)]

    return np.recfromcsv(csv, dtype=d)

Here's the error:

>>> sp500 = yahoo_prices('^GSPC')
Traceback (most recent call last):
  File "<stdin>", line 108, in <module>
  File "<stdin>", line 74, in yahoo_prices
  File "/usr/local/lib/python2.6/dist-packages/numpy/lib/npyio.py", line 1812, in recfromcsv
    output = genfromtxt(fname, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/numpy/lib/npyio.py", line 1646, in genfromtxt
    output = np.array(data, dtype=ddtype)
OverflowError: long int too large to convert to int

Why would I still be getting this error if I declared the dtype to use int64? Is this an indication that the io function isn't really using my dtype sequence d?

===Edit ... example csv added===

Date,Open,High,Low,Close,Volume,Adj Close
2012-06-15,1329.19,1343.32,1329.19,1342.84,4401570000,1342.84
2012-06-14,1314.88,1333.68,1314.14,1329.10,3687720000,1329.10
2012-06-13,1324.02,1327.28,1310.51,1314.88,3506510000,1314.88
1
  • Could you show a line or two of the sample CSV input? Commented Jun 16, 2012 at 15:51

2 Answers 2

3

I'm not sure, but I think you found a bug in numpy. I filed it here.

As I said there, if you open npyio.py and edit this line within recfromcsv:

kwargs.update(dtype=kwargs.get('update', None),

to this:

kwargs.update(dtype=kwargs.get('dtype', None),

Then it works for me with no problem for the long integer (I didn't check the datetime correctness as Joe wrote in his answer). You may notice that your dates weren't being converted either. Here is the specific code that works. The contents of "test.csv" are copy pasted from your example csv data.

import numpy as np
d = [('date',      np.datetime64),
    ('open',      np.float64),
    ('high',      np.float64),
    ('low',       np.float64),
    ('close',     np.float64),
    ('volume',    np.int64),
    ('adj_close', np.float64)]
a = np.recfromcsv("test.csv", dtype=d)
print(a)

[ (datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 1329.19, 1343.32, 1329.19, 1342.84, 4401570000, 1342.84)
 (datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 1314.88, 1333.68, 1314.14, 1329.1, 3687720000, 1329.1)
 (datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 1324.02, 1327.28, 1310.51, 1314.88, 3506510000, 1314.88)]

Update: If you don't want to modify numpy, just use the relevant numpy code for recfromcsv

I've also "fixed" the datetime issue by using a native python object in the datetime field. I don't know if that will work for you.

import datetime
import numpy as np

d = [('date',     datetime.datetime),
    ('open',      np.float64),
    ('high',      np.float64),
    ('low',       np.float64),
    ('close',     np.float64),
    ('volume',    np.int64),
    ('adj_close', np.float64)]

#a = np.recfromcsv("test.csv", dtype=d)
kwargs = {"dtype": d}
case_sensitive = kwargs.get('case_sensitive', "lower") or "lower"
names = kwargs.get('names', True)
kwargs.update(
    delimiter=kwargs.get('delimiter', ",") or ",",
    names=names,
    case_sensitive=case_sensitive)
output = np.genfromtxt("test.csv", **kwargs)
output = output.view(np.recarray)

print(output)
Sign up to request clarification or add additional context in comments.

4 Comments

I think you're correct about the bug, however your issue is much more likely to get attention on github (They're in the process of moving away from the old bug tracker).
On a side note, I'm not sure why the datetimes aren't converted correctly, but they're not correct using this method. np.datetime64('2012-06-15') works perfectly, but it doesn't work as a part of a dtype... (This appears to be a bug that's been fixed, though... Seems to work when I build from the git tip?)
Yikes. I just saw "datetime" and long integers and said "good". I think np.datetime64() returns that zero date so the parser must have had a bug?
Tweaked my npyio.py for the kwarg. Works like a charm. Thanks much.
1

You need to convert your date strings to actual dates. The formats in your dtype are being ignored because the first column can't be directly converted to a datetime.

numpy expects you to be fairly explicit and refuses to guess date formats.

(Edit: This used to be the case, but isn't anymore.)

It expects datetime objects. See dateutil.parser if you want to guess date/time formats from strings.

At any rate, you'll want something like the following:

from cStringIO import StringIO
import datetime as dt
import numpy as np

dat = """Date,Open,High,Low,Close,Volume,Adj Close
2012-06-15,1329.19,1343.32,1329.19,1342.84,4401570000,1342.84
2012-06-14,1314.88,1333.68,1314.14,1329.10,3687720000,1329.10
2012-06-13,1324.02,1327.28,1310.51,1314.88,3506510000,1314.88"""
infile = StringIO(dat)

d = [('date',      np.datetime64),
     ('open',      np.float64),
     ('high',      np.float64),
     ('low',       np.float64),
     ('close',     np.float64),
     ('volume',    np.int64),
     ('adj_close', np.float64)]


def parse_date(item):
    return dt.datetime.strptime(item, '%Y-%M-%d')

data = np.recfromcsv(infile, converters={0:parse_date}, dtype=d)

However, things like this are where pandas shines. Consider using something like the following:

from cStringIO import StringIO
import pandas

dat = """Date,Open,High,Low,Close,Volume,Adj Close
2012-06-15,1329.19,1343.32,1329.19,1342.84,4401570000,1342.84
2012-06-14,1314.88,1333.68,1314.14,1329.10,3687720000,1329.10
2012-06-13,1324.02,1327.28,1310.51,1314.88,3506510000,1314.88"""

infile = StringIO(dat)
data =  pandas.read_csv(infile, index_col=0, parse_dates=True)

1 Comment

could you check my answer? If I filed a specious bug, I'd like to go delete it. I seemed to get a working datetime with whatever assumptions it made.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.