I have a very lare csv file containing three columns. Now I want to load these columns as fast as possible into a matlab matrix.
Currently what I do is this
fid = fopen(inputfile, 'rt');
g = textscan(fid,'%s','delimiter','\r\n');
tdata = g{1};
fclose(fid);
results = zeros([numel(tdata)-4], 3);
tic
display('start reading data...');
for r = 4:numel(tdata)
if ~mod(r, 100)
display(['data row: ' num2str(r) ' / ' num2str(numel(tdata))]);
end
entries = strsplit(tdata{r}, ',');
results(r-3,1) = str2double(strrep(entries{1},',', '.'));
results(r-3,2) = str2double(strrep(entries{2},',', '.'));
results(r-3,3) = str2double(strrep(entries{3},',', '.'));
end
This however takes ~30 seconds for 200 000 lines. This means 150 µs per line. This is really slow. The code is not accepted by parfor.
Now I would like to know what causes the bottleneck in the for loop and how I can speed it up.
Here the measured times:
str2double 578253 calls 29.631s
strsplit 192750 calls 13.388s
EDIT: The content has this structure in the file
0.000000, -0.00271, 5394147
0.000667, -0.00271, 5394148
0.001333, -0.00271, 5394149
0.002000, -0.00271, 5394150
code profiling? You can also use the "Run and Time" option in the GUI to determine the slow step.