0

Before asking my question, here's a little background so you understand what I'm doing. I'm looking to analyze a very large data set (a little less than 2,000,000 rows). I've parsed the data set into Matlab and built a structure array from this data, giving names, dates, returns, etc for each asset i. Now, I would like to restrict my data set to being between two days, and Matlab doesn't seem to be particularly amenable to that kind of approach. One suggestion that was given to me was to take the dates, which are of the form MM/DD/YYYY and use a delimiter '/' to somehow build three integer arrays for my data structure (which I'd call stock(i).month, stock(i).day, and stock(i).year). However, nothing I'm doing seems to be working, and I'm very much stuck.

What I have been trying to do is something like the following:

%% Dates
fid = fopen('52c6d3831952b24a.csv');
C = textscan(fid, [repmat('%*s ',1,0),'%s %*[^\n]'], 'delimiter',',');
date = C{1}(2:end,1);
fclose(fid);

for i=1:numStock
    locate = strcmp(uniquePermno{i},permno);
    stock(i+1).date = date(locate);
end;

for i = 1:numStock
    stock(i+1).date = char(stock(i+1).date);
    D = textscan(stock(i+1).date, '%s %s %s', 'delimiter','/');
    stock(i+1).month = D{1}(1:end);
    stock(i+1).day = D{2}(1:end);
    stock(i+1).year = D{3}(1:end);
end

I initially wanted to save them as integers (and was using %u instead), but I was getting a strange situation where most of my entries were just 0 and the non-zero ones were very large (obviously not what I expected). However, the above form returns the following error:

Error using textscan
Buffer overflow (bufsize = 4095) while reading string from
file (row 1 u, field 1 u).  Use 'bufsize' option. See HELP TEXTSCAN.
44444444444444444444455555555555555555555566666666666666666666677777777777777777777778888888888888888888889999999999999999999990000000000000000000000011111111111111111112222222222222222222222111111111

Error in makeData_CRSP (line 87)
D = textscan(stock(i+1).date, '%s %s %s', 'delimiter','/');

So I'm honestly at a loss for how to approach this. What am I doing wrong? Seeing how I saved my dates vectors for my data structure, is this the best way to approach this problem?

1 Answer 1

1

You can use the datenum function to convert dates into numbers. The syntax is datenum(dateString, format). For example, if your dates are in the format YYYY MM DD then that would be

datenum('2012 12 04', 'yyyy mm dd')

Once you converted all your dates like that you can simply compare the resulting numbers using > and <:

>> datenum('2012 12 04', 'yyyy mm dd') > datenum('2012 12 03', 'yyyy mm dd')

ans =

     1

>> datenum('2012 12 04', 'yyyy mm dd') > datenum('2012 12 05', 'yyyy mm dd')

ans =

     0
Sign up to request clarification or add additional context in comments.

2 Comments

Yeah that could work. Thanks. My data is currently saved in a cell array (I put the char in there to convert it to string but it seems to be causing more trouble than anything else). Will datenum work on elements in a cell array?
Follow-up: Yes, it does. Thank you. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.