3

Sample Data -

Header1, full_name, header3, header4

  1. 20, "bob, XXX", "test", 30
  2. 20, "evan"s,YYY ", "test", 30
  3. 20, "Tom, ZZZ", "test", 30

    CSVReader csvReader = new CSVReader(reader, ',', '"');
    

The second row doesn't read as expected. since there is a double quote in the full_name column value.

I want to ignore such cases. any suggestion would be appreciated.

using openCSV java api for parsing.

Edit:

I am getting the data from database. one of the database column field has that one double quote in it's value. Because of that the csv data looks malformed.

5
  • Possible duplicate of CSV parser in JAVA, double quotes in string (SuperCSV, OpenCSV) Commented Sep 15, 2016 at 18:16
  • 4
    The CSV is malformed. See tools.ietf.org/html/rfc4180, Rule 7. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. Commented Sep 15, 2016 at 18:54
  • It is not malformed, the Original data has double quote inside it. @Guenther Commented Sep 15, 2016 at 22:12
  • Maybe Java Parser with regex can help you Commented Sep 15, 2016 at 22:25
  • I need to rewrite the logic to parse the csv file. I intended to use any csv reader frameworks for now. If nothing works out I might write custom parsing for it. Thanks for suggestion @pilkington Commented Sep 16, 2016 at 15:48

1 Answer 1

2

univocity-parsers can handle unescaped quotes and is also 4x faster than opencsv. Try this code:

public static void main(String... args){
    String input = "" +
            "20, \"bob, XXX\", \"test\", 30\n" +
            "20, \"evan\"s,YYY \", \"test\", 30\n" +
            "20, \"Tom, ZZZ\", \"test\", 30 ";


    CsvParserSettings settings = new CsvParserSettings();

    CsvParser parser = new CsvParser(settings);
    List<String[]> rows = parser.parseAll(new StringReader(input));

    //printing values enclosed in [ ]  to make sure you are getting the expected result
    for(String[] row : rows){
        for(String value : row){
            System.out.print("[" + value + "],");

        }
        System.out.println();
    }
}

This will produce:

[20],[bob, XXX],[test],[30],
[20],["evan"s],[YYY "],[test],[30],
[20],[Tom, ZZZ],[test],[30],

Additionally, you can control how to handle unescaped quotes with one of:

settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_DELIMITER);
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.STOP_AT_CLOSING_QUOTE);
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.RAISE_ERROR);
settings.setUnescapedQuoteHandling(UnescapedQuoteHandling.SKIP_VALUE);

When reading large files, you can use a RowProcessor or iterate over each row like this:

parser.beginParsing(new File("/path/to/your.csv"));

String[] row;
while ((row = parser.parseNext()) != null) {
    // process row
}

Disclaimer: I'm the author of this libary. It's open source and free (Apache 2.0 license)

Sign up to request clarification or add additional context in comments.

3 Comments

your solution is works good for small data. I am dealing with huge thousands of rows and hundred's of columns. doing this might add more time. Thanks for the suggestion.
There are many ways to read the data. I just posted an example. You can read files with trillions of rows and hundreds of gigabytes with it. Read the tutorial to learn more.
I've updated my answer to show how you can use the library to process large files. A 100mb file with 3 million rows takes about 700ms to be fully parsed on my macbook pro. Hope this helps

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.