2

In a csv file I have data representing the date, open, close, high, low, and volume for a particular stock. The data is stored in the following format:

20150601 000000;1.094990;1.095010;1.094990;1.094990;0

I am attempting to use the following code to extract the date into a numpy array so i can analyze the data using algorithms. However, when converting the date I do not get the correct date.

Can anyone identify the error that I am making?

datefunc = lambda x: mdates.date2num(datetime.strptime(x, '%y%m%d%H%M %f'))
date,high,low,open,close,volume = np.loadtxt('DAT_ASCII_EURUSD_M1_201506.csv',unpack=True, 
                              delimiter=';',
                              converters={0:datefunc})

Any help is much appreciated.

5
  • Is your sample line incorrect? Also what is mdates.date2num? Commented Jun 30, 2015 at 21:53
  • 1
    I suspect he has done import matplotlib.dates as mdates. Commented Jun 30, 2015 at 21:59
  • Your date format is also incorrect Commented Jun 30, 2015 at 21:59
  • what would be the correct date format? Commented Jun 30, 2015 at 22:00
  • it would be '%Y%m%d' but you cannot have datetimes and floats in the same array. I think pandas would be pretty useful Commented Jun 30, 2015 at 22:01

1 Answer 1

2

Your date format is incorrect, it needs to be year,month and day "%Y%m%d", you also cannot have a datetime object and floats in your array but using a structured array allows you to have mixed types.

If mdates returns a float using the correct format should work again providing you have a ; delimited lines:

from datetime import datetime
import numpy as np
datefunc = lambda x: mdates.date2num(datetime.strptime(x, '%Y%m%d'))

a = np.loadtxt('in.csv', delimiter=';',
                  converters={0: datefunc})

Which would output:

[  7.35750000e+05   0.00000000e+00   1.09499000e+00   1.09501000e+00
1.09499000e+00   1.09499000e+00   0.00000000e+00]

You have seven elements in your example input line so you will get an error unpacking, if that is a typo then it will be ok but if not you will need to fix it.

If you have mixed types you could use a structured array with genfromtxt :

from datetime import datetime
import numpy as np
datefunc = lambda x: datetime.strptime(x, '%Y%m%d')
a = np.genfromtxt('in.csv', delimiter=';',
              converters={0: datefunc}, dtype='object, float, float,float,float,float',
              names=["date", "high", "low", "open", "close", "volume"])

print(a["date"])
print(a["high"])
print(a["low"])
print(a["open"])
print(a["close"])
print(a["volume"])

2015-06-01 00:00:00
0.0
1.09499
1.09501
1.09499
1.09499

This presumes your input is actually delimited by ; and does not have spaces like you have in your sample line.

Sign up to request clarification or add additional context in comments.

6 Comments

Thank you for correcting my format and for fixing the mixed data type issue. However, when I use this method of conversion i get the following error: names=["date", "high", "low", "open", "close", "volume"]) TypeError: loadtxt() got an unexpected keyword argument 'names' >>>
@Jerryberry123, you need to use genfromtxt for that, my mistake
Thank you! However the date is formatted as year month day followed by a space then the millisecond value {20150601 000000;1.094990;1.095010;1.094990;1.094990;0} {20150601 000100;1.094990;1.094990;1.094920;1.094940;0} {20150601 000200;1.094940;1.095060;1.094890;1.095050;0} {20150601 000300;1.095090;1.095130;1.095050;1.095060;0}
Ah ok now it all makes sense, change to '%Y%m%d %f'
Thanks all working now. Although the data is represented as datetime.datetime(2015, 6, 1, 0, 0, 0, 100). Should it be in that format or in the 2015-06-01 000000 format?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.