4

I am trying to parse CSV file using Jackson's CSV data format module.

I tried sample code given on their project homepage (https://github.com/FasterXML/jackson-dataformat-csv)

CsvMapper mapper = new CsvMapper();
mapper.enable(CsvParser.Feature.WRAP_AS_ARRAY);
File csvFile = new File("input.csv");
MappingIterator<String[]> it =  mapper.reader(String[].class).readValues(csvFile);
while (it.hasNext()) {
    String[] row = it.next();
    System.out.println(row)
}

this small code is giving me error

Exception in thread "main" java.io.CharConversionException: Invalid UTF-8 start byte 0x92 (at char #269, byte #-1)
at com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.reportInvalidInitial(UTF8Reader.java:393)
at com.fasterxml.jackson.dataformat.csv.impl.UTF8Reader.read(UTF8Reader.java:245)
at com.fasterxml.jackson.dataformat.csv.impl.CsvReader.loadMore(CsvReader.java:438)
at com.fasterxml.jackson.dataformat.csv.impl.CsvReader.hasMoreInput(CsvReader.java:475)
at com.fasterxml.jackson.dataformat.csv.CsvParser._handleStartDoc(CsvParser.java:461)
at com.fasterxml.jackson.dataformat.csv.CsvParser.nextToken(CsvParser.java:414)
at com.fasterxml.jackson.databind.ObjectReader._bindAndReadValues(ObjectReader.java:1492)
at com.fasterxml.jackson.databind.ObjectReader.readValues(ObjectReader.java:1335)
at com.til.etwealth.etmoney.util.alok.main(alok.java:18)  

I am able to read same file using openCSV
I tried to find out through this error on internet but could not find useful. please someone tell what I am missing?

2 Answers 2

4

Most likely you are reading content that is not UTF-8 encoded, but using something else, such as Latin-1 (ISO-8859-1). I think that error message you get is not very good, so maybe it could be improved to suggest likely reason, as this is relatively common problem.

To read non-Unicode encodings, you need to construct Reader yourself (since it is not possible to reliably auto-detect difference -- although there may be Java libs that could use heuristics to try to determine this automatically):

mapper.readValues(new InputStreamReader(new FileInputStream(csvFile), "ISO-8859-1");

alternatively it may be that whatever is used to encode the file should specify UTF-8 encoding to be used.

There are other possible reasons (such as file truncation), but mismatching character encoding is a common reason. The main oddity here is actually that particular character code, which is not a printable character in (most?) ISO-8859-x encodings.

Sign up to request clarification or add additional context in comments.

2 Comments

I am sure there is no non printable or special character in my file. and I am able to read my file using openCVS
If you have a sample file that triggers this, it would be good to file a bug report at (github.com/FasterXML/jackson-dataformat-csv/issues).
1

A workaround which will work in most cases is to import Apache Tika and use the AutoDetectReader (see https://tika.apache.org/1.2/api/org/apache/tika/detect/AutoDetectReader.html)

Try this:

   //get a file stream in utf format for this file (since they are often not in utf by 
   Charset charset = new AutoDetectReader(new FileInputStream(file)).getCharset();
   String f = FileUtils.readFileToString(file, charset);
   CsvMapper mapper = new CsvMapper();
   CsvSchema schema = CsvSchema.emptySchema().withHeader();
   MappingIterator<Map<String, String>> it = mapper.reader(Map.class).with(schema).readValues(f.getBytes());

Where I also used apache commons to convert the file to a String. This can be done without apache commons, just google it

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.