0

I need to deal with a CSV file that actually contains several tables, like this:

"-------------------- Section 1 --------------------"

"Identity:","ABC123"
"Initials:","XY"
"Full Name:","Roger"
"Street Address:","Foo St"


"-------------------- Section 2 --------------------"

"Line","Date","Time","Status",

"1","30/01/2013","10:49:00 PM","ON",
"2","31/01/2013","8:04:00 AM","OFF",
"3","31/01/2013","11:54:00 PM","OFF",


"-------------------- Section 3 --------------------"

I'd like to parse the blocks in each section with something like commons-csv, but it would be helpful to handle each section individually, stopping at the double-newline as if it was the end of file. Has anyone tackled this problem already?

NOTE: Files can be arbitrarily long, and can contain any number of sections, so I'm after a single pass if possible. Each section appears to start with a titled heading (------- title ------\n\n) and end with two empty lines.

3
  • Its very simple if you are getting four values by reading and splitting line then its from section2 if getting two value then its from section1 Commented Jan 7, 2016 at 21:58
  • I suspect you will need to write code to pre-process the file, either into memory or into temporary files. Commented Jan 7, 2016 at 22:00
  • Files can be arbitrarily long, and can contain any number of sections, so I'm after a single pass if possible. I'll add these details. Commented Jan 7, 2016 at 23:14

3 Answers 3

3

How about use java.io.FilterReader? You can figure out what Reader methods you need to override by trial and error. You custom class will have to read ahead an entire line and see if it is a 'Section' line. If it is, then return EOF to stop the commons-csv parser. You can then read the next section from your custom class. Not elegant, but it would probably work. Example given:

class MyReader extends FilterReader {
    private String line;
    private int pos;
    public MyReader(BufferedReader in) { 
        super(in);
        line = null;
        pos = 0;
    }
    @Override
    public int read() {
        try {
            if ( line == null || pos >= line.length() ) {
                do {
                    line = ((BufferedReader)in).readLine();
                } while ( line != null && line.length() == 0 );
                if ( line == null ) return -1;
                line = line + "\r\n";
                pos = 0;
            }
            if ( line.contains("-------------------- Section ") ) {
                line = null;
                return -1;
            }
            return line.charAt(pos++);
        } catch ( Exception e) { throw new RuntimeException(e); }
    }
}

You would use it like so:

public void run() throws Exception {
    BufferedReader in = new BufferedReader(new FileReader(ReadRecords.class.getResource("/records.txt").getFile()));
    MyReader reader = new MyReader(in);
    int c;
    while( (c=reader.read()) != -1 ) { 
        System.out.print((char)c);
    }
    while( (c=reader.read()) != -1 ) { 
        System.out.print((char)c);
    }
    while( (c=reader.read()) != -1 ) { 
        System.out.print((char)c);
    }
    reader.close();
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, I think this is most like what I was after.
@beldaz no problem. You probably want to return -2 for the actual EOF and stop parsing sections. E.g. if ( line == null ) return -2;
1

You can use String.split() to access the individual CSV sections:

for (String csv : content.split("\"----+ Section \\d+ ----+\"")) {

    // Skip empty sections
    if (csv.length() == 0) continue;

    // parse and process each individual "csv" section here
}

2 Comments

So does that mean the whole file needs to be parsed as a String, and then the different strings processed by a CSV parser? The files can be arbitrarily large.
@beldaz: It's one way to solve this problem. The simplest approach in terms of lines of code. Not necessarily the fastest or most suitable. Rather than using a regex on the whole String, you could read line by line and process the CSV content as soon as you encounter a section.
0

Assuming that the file contains text in 2 sections, delineated as per the example, its processing is straightforward, e.g.:

  1. Create a Java BufferedReader object to read the file line-by-line
  2. Read Section 1 and extract the key-value pairs
  3. Read and ignore the remaining lines, until the CSV header (Section 2)
  4. Initialize a CSV parser (commons-csv or other) using the header and the other parameters (comma separator, quotes etc.)
  5. Process every subsequent line with the parser

The parser will provide some iterator-like API to read each line into a Java object, from which reading the fields will be trivial. This approach is vastly superior to pre-loading everything in memory, because it can accommodate any file size.

6 Comments

Yes, I think this is the best approach, but steps 4-5 are non-trivial without knowing how commons-csv uses the Reader API. Plus, how do we handle the end-of-table issue?
You mean the "Section 1" and "Section 2" areas are repeated throughout the file, or just Section 2? Please clarify the data in the question, by adding multiple tables. The problem would still be trivial, we can come up with the solution easily. :-)
There can be multiple sections, some like the example Section 1 and some like the example Section 2. Their content and structure isn't very specific. The main thing is that each region appears to start with a title "------- Name -----\n\n" and end with two empty lines.
OK, do you know which section has data like "Identity:" and which data like "Line","Date","Time","Status"? You need to be more clear in your example.
Thanks but that's getting more complicated than I'm after. Just want to terminate current csv block on double new line and iterate to next block starting with section heading.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.