Parsing part of a CSV file in Java

Question

I need to deal with a CSV file that actually contains several tables, like this:

"-------------------- Section 1 --------------------"

"Identity:","ABC123"
"Initials:","XY"
"Full Name:","Roger"
"Street Address:","Foo St"


"-------------------- Section 2 --------------------"

"Line","Date","Time","Status",

"1","30/01/2013","10:49:00 PM","ON",
"2","31/01/2013","8:04:00 AM","OFF",
"3","31/01/2013","11:54:00 PM","OFF",


"-------------------- Section 3 --------------------"

I'd like to parse the blocks in each section with something like commons-csv, but it would be helpful to handle each section individually, stopping at the double-newline as if it was the end of file. Has anyone tackled this problem already?

NOTE: Files can be arbitrarily long, and can contain any number of sections, so I'm after a single pass if possible. Each section appears to start with a titled heading (------- title ------\n\n) and end with two empty lines.

Its very simple if you are getting four values by reading and splitting line then its from section2 if getting two value then its from section1 — KhAn SaAb
– KhAn SaAb, Commented Jan 7, 2016 at 21:58
I suspect you will need to write code to pre-process the file, either into memory or into temporary files. — Eric J.
– Eric J., Commented Jan 7, 2016 at 22:00
Files can be arbitrarily long, and can contain any number of sections, so I'm after a single pass if possible. I'll add these details. — beldaz
– beldaz, Commented Jan 7, 2016 at 23:14

K.Nicholas · Accepted Answer · 2016-01-08 04:07:30Z

3

How about use java.io.FilterReader? You can figure out what Reader methods you need to override by trial and error. You custom class will have to read ahead an entire line and see if it is a 'Section' line. If it is, then return EOF to stop the commons-csv parser. You can then read the next section from your custom class. Not elegant, but it would probably work. Example given:

class MyReader extends FilterReader {
    private String line;
    private int pos;
    public MyReader(BufferedReader in) { 
        super(in);
        line = null;
        pos = 0;
    }
    @Override
    public int read() {
        try {
            if ( line == null || pos >= line.length() ) {
                do {
                    line = ((BufferedReader)in).readLine();
                } while ( line != null && line.length() == 0 );
                if ( line == null ) return -1;
                line = line + "\r\n";
                pos = 0;
            }
            if ( line.contains("-------------------- Section ") ) {
                line = null;
                return -1;
            }
            return line.charAt(pos++);
        } catch ( Exception e) { throw new RuntimeException(e); }
    }
}

You would use it like so:

public void run() throws Exception {
    BufferedReader in = new BufferedReader(new FileReader(ReadRecords.class.getResource("/records.txt").getFile()));
    MyReader reader = new MyReader(in);
    int c;
    while( (c=reader.read()) != -1 ) { 
        System.out.print((char)c);
    }
    while( (c=reader.read()) != -1 ) { 
        System.out.print((char)c);
    }
    while( (c=reader.read()) != -1 ) { 
        System.out.print((char)c);
    }
    reader.close();
}

answered Jan 8, 2016 at 4:07

K.Nicholas

11.6k4 gold badges50 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

beldaz Over a year ago

Thanks, I think this is most like what I was after.

K.Nicholas Over a year ago

@beldaz no problem. You probably want to return -2 for the actual EOF and stop parsing sections. E.g. if ( line == null ) return -2;

Lukas Eder · Accepted Answer · 2016-01-07 22:15:45Z

1

You can use String.split() to access the individual CSV sections:

for (String csv : content.split("\"----+ Section \\d+ ----+\"")) {

    // Skip empty sections
    if (csv.length() == 0) continue;

    // parse and process each individual "csv" section here
}

edited Jan 7, 2016 at 22:15

answered Jan 7, 2016 at 22:09

Lukas Eder

223k138 gold badges731 silver badges1.6k bronze badges

2 Comments

beldaz Over a year ago

So does that mean the whole file needs to be parsed as a String, and then the different strings processed by a CSV parser? The files can be arbitrarily large.

Lukas Eder Over a year ago

@beldaz: It's one way to solve this problem. The simplest approach in terms of lines of code. Not necessarily the fastest or most suitable. Rather than using a regex on the whole String, you could read line by line and process the CSV content as soon as you encounter a section.

PNS · Accepted Answer · 2016-01-07 22:49:34Z

0

Assuming that the file contains text in 2 sections, delineated as per the example, its processing is straightforward, e.g.:

Create a Java BufferedReader object to read the file line-by-line
Read Section 1 and extract the key-value pairs
Read and ignore the remaining lines, until the CSV header (Section 2)
Initialize a CSV parser (commons-csv or other) using the header and the other parameters (comma separator, quotes etc.)
Process every subsequent line with the parser

The parser will provide some iterator-like API to read each line into a Java object, from which reading the fields will be trivial. This approach is vastly superior to pre-loading everything in memory, because it can accommodate any file size.

edited Jan 7, 2016 at 22:49

answered Jan 7, 2016 at 22:28

PNS

20.1k33 gold badges102 silver badges145 bronze badges

6 Comments

beldaz Over a year ago

Yes, I think this is the best approach, but steps 4-5 are non-trivial without knowing how commons-csv uses the Reader API. Plus, how do we handle the end-of-table issue?

PNS Over a year ago

You mean the "Section 1" and "Section 2" areas are repeated throughout the file, or just Section 2? Please clarify the data in the question, by adding multiple tables. The problem would still be trivial, we can come up with the solution easily. :-)

beldaz Over a year ago

There can be multiple sections, some like the example Section 1 and some like the example Section 2. Their content and structure isn't very specific. The main thing is that each region appears to start with a title "------- Name -----\n\n" and end with two empty lines.

PNS Over a year ago

OK, do you know which section has data like "Identity:" and which data like "Line","Date","Time","Status"? You need to be more clear in your example.

beldaz Over a year ago

Thanks but that's getting more complicated than I'm after. Just want to terminate current csv block on double new line and iterate to next block starting with section heading.

|

Collectives™ on Stack Overflow

Parsing part of a CSV file in Java

3 Answers 3

2 Comments

2 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related