I have a large set of data (~1 million entries), stored as a cell with a number of columns and many, many rows. My issue is that I need to identify entries that occur at the same time, and then manipulate other columns so as to remove the rows with repeated dates without losing all of the information.
An example of a subset of such data could be initialized as so;
data = {'10:30', 100; '10:30', 110; '10:31', 115;'10:32', 110}
That is, I have a cell with one column of strings (representing time), and another column (many in the real data) of doubles.
My code should notice the repeated 10:30 (there could me many such repeats), then be able to take in the corresponding doubles (100 and 110) as inputs for some function, f(100,110), and then remove the repeated row from the data.
I.e. if the function were, say, to average, I should have an output that looks something like
data =
'10:30' [105]
'10:31' [115]
'10:32' [110]
This would be fairly simple if loops were fast enough, but with my data set, there is no point in even attempting a solution involving looping through.
I have gotten as far as
[uniqueElements, firstUniquePosition, commonSets] = unique(data(:,1));
after much fiddling around, which yields some information that appears useful,
uniqueElements =
'10:30'
'10:31'
'10:32'
firstUniquePosition =
1
3
4
commonSets =
1
1
2
3
but I can't quite figure out how to make a vectorised statement that allows me to manipulate the elements with common dates.
I imagine it will involve cellfun at some point, but I don't know enough of matlab's functionality to implement it yet without a push in the right direction.
