0

I have a mysql dump csv file containing two columns, json1 and json2, both columns are JSON objects string representations. So a csv row looks like the following:

"{"field1":"value","field2":4}","{"field1":"value","field2":4}"

I need to deserialize those two strings and then unmarshal the JSON to Go values. I'm stuck at the first step. I'm having trouble with the , since the JSON strings themselves have ,s inside them so the reader breaks each line in a wrong number of fields, never two as needed.

Here is my full code:

reader := csv.NewReader(csvFile)
reader.LazyQuotes = true //allows non-doubled quotes to appear in quoted fields

for {

    record, err := reader.Read()
    if err == io.EOF {
        break
    }
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("json1: %s json2 %s\n", record[0], record[1])
}

What I've tried

I've tried setting the csv delimiter to }","{ and then appending the corresponding } and { to the resulting strings but, besides it being prone to errors, some of the rows have a NULL json1 or json2.

Observations

I'm using - golang 1.12.1

2
  • 2
    This is neither proper CSV nor JSON. You could read the line with a string reader, consume the first quote, then parse JSON using a decoder, then consume ",", and then parse the rest again using a decoder. Commented Mar 5, 2020 at 23:23
  • 1
    This is basically impossible.Either the format of the file is specified and than you have to write a parser for it (You cannot use encoding/{csv,json} unless it actually is CSV and JSON or the format is not specified than you need to use heuristics which seem to work. Sometimes it is easier to change upstream. Commented Mar 6, 2020 at 4:38

1 Answer 1

1

I would just use strings.Split() to split on }","{ (if you are sure that will always work) then Unmarshall the JSON strings as you say. Can you get the dump file to have nested quotes delimited somehow?

columns := strings.Split(`"{"field1":"value","field2":4}","{"field1":"value","field2":5}"`, `}","{`)
for i, s := range columns {
    if i == 0 {
        s = s[1:]  // remove leading quote
    }
    if i == len(columns) - 1 {
        s = s[:len(s)-1] // remove trailing quote
    }
    if i > 0 {
        s = "{" + s
    }
    if i < len(columns) - 1 {
        s += "}"
    }
    // unmarshal JSON ...
}

This is a bit of a kludge but should work even if some fields are NULL.

Sign up to request clarification or add additional context in comments.

2 Comments

Splitting on }","{ won't work since sometimes one json or the other has a NULL value. I'm not sure if I can get the nested quotes delimited I will find out, if it's possible what would you suggest?
Thanks for your answer, I ended up implementing something very similar to what you propose. Since I don't need to process those rows lacking one of the JSONs I took the length of the fields returned by Split as a reference, if it's different than 2 I just skip that row. Not completely satisfied with the approach but since I can't change the input file this will do by now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.