1

I am writing a program to parse key value based log like this:

dstcountry="United States" date=2018-12-13 time=23:47:32

I am using Univocity parser to do that. Here is my code.

CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setDelimiter(' ');
parserSettings.getFormat().setQuote('"');
parserSettings.getFormat().setQuoteEscape('"');
parserSettings.getFormat().setCharToEscapeQuoteEscaping('"');
CsvParser keyValueParser = new CsvParser(parserSettings);
String line = "dstcountry=\"United States\" date=2018-12-13 time=23:47:32";
String[] resp = keyValueParser.parseLine(line);

But the parser gives me this output:

dstcountry="United, 
States", 
date=2018-12-13, 
time=23:47:32

where the expected output was

dstcountry="United States", 
date=2018-12-13, 
time=23:47:32

Is there any problem with the code or is this a parser bug?

Regards,
Hari

2 Answers 2

1

Author of the lib here. This is not a parser bug. The problem you have here is that you are NOT parsing a CSV file.

When the parser sees: dstcountry="United, followed by a space (which is your delimiter), it will consider that as a value.

The quote setting only applies to fields that start with a quote character. As your input is not "dstcountry=""United States""", the parser won't be able to process this as you want. There is no CSV parser that can do that for you.

Again, you are not processing a CSV. The only thing you could do here is to use 2 parser instances: one to break down the row around the = and another one to break down values separated by in the result of the first parser. For example:

    CsvParserSettings parserSettings = new CsvParserSettings();
    //break down the rows around the `=` character
    parserSettings.getFormat().setDelimiter('=');

    CsvParser keyValueParser = new CsvParser(parserSettings);
    String line = "dstcountry=\"United States\" date=2018-12-13 time=23:47:32";
    String[] keyPairs = keyValueParser.parseLine(line);

    //break down each value around the whitespace.
    parserSettings.getFormat().setDelimiter(' ');
    CsvParser valueParser = new CsvParser(parserSettings);

    //add all values to a list
    List<String> row = new ArrayList<String>();

    for(String value : keyPairs){
        //if a value has a whitespace, break it down using the the other parser instance
        String[] values = valueParser.parseLine(value);

        Collections.addAll(row, values);
    }

    //here is your result
    System.out.println(row);

This will print out:

[dstcountry, United States, date, 2018-12-13, time, 23:47:32]

You now have the key values. The following code will print this out as you want:

    for (int i = 0; i < row.size(); i += 2) {
        System.out.println(row.get(i) + " = " + row.get(i + 1));
    }

Output:

dstcountry = United States

date = 2018-12-13

time = 23:47:32

Hope this helps and thank you for using our parsers!

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the reply. I will try this.
0

I ended up writing my own parser. I am pasting here for future references if anybody needs. suggestions and comments are welcome.

private static final int INSIDE_QT = 1;
private static final int OUTSIDE_QT = 0;

public String[] parseLine(char delimiter, char quote, char quoteEscape, char charToEscapeQuoteEscaping, String logLine) {
           char[] line = logLine.toCharArray();
    List<String> strList = new ArrayList<>();
    int state = OUTSIDE_QT;
    char lastChar = '\0';
    StringBuffer currentToken = new StringBuffer();
    for (int i = 0; i < line.length; i++) {
        if (state == OUTSIDE_QT) {
            if (line[i] == delimiter) {
                strList.add(currentToken.toString());
                currentToken.setLength(0);
            } else if (line[i] == quote) {
                if (lastChar == quoteEscape) {
                    currentToken.deleteCharAt(currentToken.length() - 1);
                    currentToken.append(line[i]);
                } else {
                    if (removeQuotes == false) {
                        currentToken.append(line[i]);
                    }
                    state = INSIDE_QT;
                }
            } else if (line[i] == quoteEscape) {
                if (lastChar == charToEscapeQuoteEscaping) {
                    currentToken.deleteCharAt(currentToken.length() - 1);
                    currentToken.append(line[i]);
                    continue;
                } else {
                    currentToken.append(line[i]);
                }
            } else {
                currentToken.append(line[i]);
            }
        } else if (state == INSIDE_QT) {
            if (line[i] == quote) {
                if (lastChar != quoteEscape) {
                    if (removeQuotes == false) {
                        currentToken.append(line[i]);
                    }
                    if (currentToken.length() == 0) {
                        currentToken.append('\0');
                    }
                    state = OUTSIDE_QT;
                } else {
                    currentToken.append(line[i]);
                }
            } else if (line[i] == quoteEscape) {
                if (lastChar == charToEscapeQuoteEscaping) {
                    currentToken.deleteCharAt(currentToken.length() - 1);
                    currentToken.append(line[i]);
                    continue;
                } else {
                    currentToken.append(line[i]);
                }
            } else {
                currentToken.append(line[i]);
            }
        }
        lastChar = line[i];
    }
    if (lastChar == delimiter) {
        strList.add("");
    }
    if (currentToken.length() > 0) {
        strList.add(currentToken.toString());
    }
    return strList.toArray(new String[strList.size()]);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.