3

I have a csv file which contains 2d arrays of 4 columns but a varying number of rows. Eg:

2, 354, 23, 101
3, 1023, 43, 454
1, 5463, 45, 7657

4, 543, 543, 654
3, 56, 7654, 344

...

I need to be able to import the data such that I can run operations on each block of data, however csvread, dlmread and textscan all ignore the blank lines.

I can't seem to find a solution anywhere, how can this be done?

PS:

It may be worth noting that the files of the format above are actually the concatenation of many files containing only one block of data (I don't want to have to read from thousands of files every time) therefore the blank line between blocks can be changed to any other delimiter / marker. This is just done with a python script.

EDIT: My Solution - based upon / inspired by petrichor below

I replaced the csvread with textscan which is faster. Then I realised that if I replaced the blank lines with lines of nan instead (modifying my python script) I could remove the need for a second textscan the slow point. My code is:

filename = 'data.csv';
fid = fopen(filename);
allData = cell2mat(textscan(fid,'%f %f %f %f','delimiter',','));
fclose(fid);

nanLines = find(isnan(allData(:,1)))';

iEnd = (nanLines - (1:length(nanLines)));
iStart = [1 (nanLines(1:end-1) - (0:length(nanLines)-2))];
nRows = iEnd - iStart + 1;

allData(nanLines,:)=[];

data = mat2cell(allData, nRows);

Which evaluates in 0.28s (a file of just of 103000 lines). I've accepted petrichor's solution as it indeed best solves my initial problem.

7
  • I suppose one way would be to replace the blank lines with something like NaN, NaN, NaN, NaN and then after loading the data using csvread or something similar you could loop through the data and extract the blocks in matlab quite easily. Commented May 14, 2012 at 9:27
  • I was hoping to avoid having to loop back through the data after import as (I'm assuming) this would just add more time to the whole process. On another note, I've found so far that textscan is the fastest way of importing? Commented May 14, 2012 at 9:34
  • What about leaving no delimiter line at all but rather creating a second file that is just the row indices of when the new blocks start and then use this file to define the row range to work on rather than creating separate matrices for each? Commented May 14, 2012 at 9:52
  • I don't know how to sort my colours out in the code block?! Commented May 15, 2012 at 18:10
  • Haha - I'd also like to know how to do the colours correctly. Out of interest what are you planning on doing with this data, can you provide an example of how you are going to use it? I ask because storing in a cell array might not be the most efficient way... Commented May 16, 2012 at 6:25

1 Answer 1

1
filename = 'data.txt';

%# Read all the data
allData = csvread(filename);

%# Compute the empty line indices
fid = fopen(filename);
lines = textscan(fid, '%s', 'Delimiter', '\n');
fclose(fid);
blankLines = find(cellfun('isempty', lines{1}))';

%# Find the indices to separate data into cells from the whole matrix
iEnd = [blankLines - (1:length(blankLines)) size(allData,1)];
iStart = [1 (blankLines - (0:length(blankLines)-1))];
nRows = iEnd - iStart + 1;

%# Put the data into cells
data = mat2cell(allData, nRows)

That gives the following for your data:

data = 

    [3x4 double]
    [2x4 double]
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks petrichor, a nice solution. As I mentioned in my comment above I had hoped to avoid looping back through the data but your solution provides a good middle ground. Timings on my actual data: I've found that csvread takes 0.46 seconds, whilst using textscan only takes 0.26. Accordingly your solution when I replace the csvread with textscan takes a total of only 0.49 seconds, which should be fast enough for me to be getting on with. I am still curious as to whether there's a 'one step' way though... Many thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.