0

I would like to parse a Google eBook transaction report. I opened it in Notepad++ to see precisely the filed and record delimiters. It is a tab delimited file, every header field and data field are surrounded by quotation marks. The first two lines of the CSV file are:

"Transaction Date" "Id"    "Product"   "Type"  "Preorder"  "Qty"   "Primary ISBN"  "Imprint Name"  "Title" "Author"    "Original List Price Currency"  "Original List Price"   "List Price Currency"   "List Price [tax inclusive]"    "List Price [tax exclusive]"    "Country of Sale"   "Publisher Revenue %"   "Publisher Revenue" "Payment Currency"  "Payment Amount"    "Currency Conversion Rate"
"2016. 09. 01." "ID:1166315449551685"   "Single Purchase"   "Sale"  "None"  "1" "9789633780664" "Book and Walk Kft" "Bánk bán"  "József Katona" "HUF"   "0,00"  "HUF"   "0,00"  "0,00"  "HU"    "52,0%" "0,00"  ""  ""  ""

I use the following code to parse the CSV file:

private List<Sales> parseCsv(File csv) {
    Calendar max = Calendar.getInstance();
    Calendar current = Calendar.getInstance();
    boolean firstRound = true;

    List<Sales> sales = new ArrayList<>();
    Sales currentRecord;
    Reader in;
    try {
        in = new FileReader(csv);
        Iterable<CSVRecord> records;

        try {

            records = CSVFormat.TDF.withQuote('\"').withFirstRecordAsHeader().parse(in);
            for (CSVRecord record : records) {
                currentRecord = new Sales();
                currentRecord.setAuthor(record.get("Author"));
                currentRecord.setTitle(record.get("Title"));
                currentRecord.setPublisher(record.get("Imprint Name"));
                currentRecord.setIsbn(record.get("Primary ISBN"));
                currentRecord.setChannel("Google");
                currentRecord.setBookId(record.get("Id"));
                currentRecord.setCountry(record.get("Country of Sale"));
                currentRecord.setUnits(Integer.parseInt(record.get("Qty")));
                currentRecord.setUnitPrice(Float.parseFloat(record.get("List Price [tax exclusive]")));

                Date transDate;
                try {
                    transDate = sourceDateFormat.parse(record.get("Transaction Date"));
                    if (firstRound) {
                        max.setTime(transDate);
                    };
                    current.setTime(transDate);
                    if (current.after(max)) {
                        max.setTime(current.getTime());
                    }
                    currentRecord.setDatum(transDate);
                } catch (ParseException e) {
                    // TODO Auto-generated catch block
                    LOG.log(Level.SEVERE,"Nem megfeelő formátumú a dátum a {0} file-ban",csv.getAbsolutePath());
                }

                currentRecord.setCurrencyCustomer(record.get("List Price Currency"));
                currentRecord.setCurrencyProceeds(record.get("Payment Amount"));
                currentRecord.setCurrencyProceeds(record.get("Payment Currency"));
                sales.add(currentRecord);
            }
            LOG.log(Level.INFO, "Daily sales transactions of {0} were successfully parsed from ",
                    csv.getAbsolutePath());
            return sales;
        } catch (IOException e1) {
            // TODO Auto-generated catch block
            LOG.log(Level.SEVERE, "Valami nem stimmel a {0} file szerkezetével",csv.getAbsolutePath());
        }
    } catch (FileNotFoundException e1) {
        // TODO Auto-generated catch block
        LOG.log(Level.SEVERE,"A {0} file-t nem találom.",csv.getAbsolutePath());
    }
    return null;
};

When I debug the parsing proccess than I can see that record.get("Author") threw runtime exception:

java.lang.IllegalArgumentException: Mapping for Author not found, expected one of [��"

Obviously I have column named Author. Any ide what goes wrong?

1
  • try to supply Charset to parse .parse(in, StandardCharsets.UTF_8) Commented Oct 20, 2016 at 8:36

2 Answers 2

1

When converting this to a unit-test and running with the current commons-csv version 1.4 this works fine for me, therefore:

  • Check with the latest version of commons-csv
  • Make sure that there are actually TABs in the file, not blanks around the author-entries for some reason
  • Specify the actual encoding of the file when calling parse() to correctly handle non-ASCII characters (thanks to comments from @tonakai)

The following unit-test works fine with commons-csv 1.4

private final static String DATA = "\"Transaction Date\"\t\"Id\"\t\"Product\"\t\"Type\"\t\"Preorder\"\t\"Qty\"\t\"Primary ISBN\"\t\"Imprint Name\"\t\"Title\"\t\"Author\"\t\"Original List Price Currency\"\t\"Original List Price\"\t\"List Price Currency\"\t\"List Price [tax inclusive]\"\t\"List Price [tax exclusive]\"\t\"Country of Sale\"\t\"Publisher Revenue %\"\t\"Publisher Revenue\"\t\"Payment Currency\"\t\"Payment Amount\"\t\"Currency Conversion Rate\"\n" +
        "\"2016. 09. 01.\"\t\"ID:1166315449551685\"\t\"Single Purchase\"\t\"Sale\"\t\"None\"\t\"1\"\t\"9789633780664\"\t\"Book and Walk Kft\"\t\"Bánk bán\"\t\"József Katona\"\t\"HUF\"\t\"0,00\"\t\"HUF\"\t\"0,00\"\t\"0,00\"\t\"HU\"\t\"52,0%\"\t\"0,00\"\t\"\"\t\"\"\t\"\"";

@Test
public void parseCsv() throws IOException {
    final CSVFormat format = CSVFormat.TDF.withQuote('\"').withFirstRecordAsHeader();
    Iterable<CSVRecord> records = format.parse(new StringReader(DATA));

    System.out.println("Headers: " + Arrays.toString(format.getHeader()));

    for (CSVRecord record : records) {
        assertNotNull(record.get("Author"));
        assertNotNull(record.get("Title"));
        assertNotNull(record.get("Imprint Name"));
        assertNotNull(record.get("Primary ISBN"));
        assertNotNull(record.get("Id"));
        assertNotNull(record.get("Country of Sale"));
        assertNotNull(record.get("Qty"));
        assertNotNull(record.get("List Price [tax exclusive]"));

        assertNotNull(record.get("Transaction Date"));

        assertNotNull(record.get("List Price Currency"));
        assertNotNull(record.get("Payment Amount"));
        assertNotNull(record.get("Payment Currency"));

        System.out.println("Record: " + record.toString());
    }
}
Sign up to request clarification or add additional context in comments.

4 Comments

I checked and I use 1.4 version. What is strange that when I got my maven deendencies in my Eclipse Ide and drill down the commons-csv-1.4.jar I see parse method with three different method signiture but the autocomplete of my ide offers the basic one.
might be because you are passing Reader in, other signatures don't use Reader. if you want to use Reader, then try to construct it without FileReader as mentioned in docs.oracle.com/javase/7/docs/api/java/io/FileReader.html using InputStreamReader on a FileInputStream.
I use version 1.4, I also checked that all the fields are delimited by tab. I checked it with excel, and noepad++. I also impemented @tonakai cooment to handle non ASCII characters correctly. 'CSVFormat csvFormat = CSVFormat.TDF.withQuote('\"').withFirstRecordAsHeader(); records = CSVParser.parse(csv, StandardCharsets.UTF_8, csvFormat);'. I still get the same erro as before.
Then you will need to narrow down the problem. First by running the unit-test that I posted and seeing if that works, then reduce your non-working code so that it becomes more and more like the unit-test, this way you will see at which step it breaks.
0

It turned out that the encoding was the source of the problem. Based on @tonakai comment I started to analyze the encoding of the Google csv report. It was UTF-16 Little Endian. As my file included byte-order mark, I had to use 'BOMInputStream' and refactor my code a little bit.

Reader r = newReader(csv);
CSVParser csvParser= CSVFormat.TDF.withFirstRecordAsHeader().withQuoteMode(QuoteMode.ALL).parse(r);

.....

private InputStreamReader newReader(final File csv) throws FileNotFoundException {
        return new InputStreamReader(new BOMInputStream(new FileInputStream(csv),ByteOrderMark.UTF_16LE), StandardCharsets.UTF_16LE);
    }

It is working now

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.