0

So I have a project with lots of incoming data about 15 sources in total, of course there are inconsistencies in how each label there data available in their rest api's. I need to Change some of their field names to be consistent with the others, but I am at a loss on how to do this when the data sources are json object arrays. A working example of what I am trying to do is found here playground and below

however I seem to lack the knowledge as to how to make this work when the data is not a single json object , but instead and array of objects that I am unmarshaling.

Another approach is using Maps like in this example but the result is the same, works great as is for single objects, but I can not seem to get it to work with json object arrays. Iteration through arrays is not a possibility as I am collecting about 8,000 records every few minutes.

package main

import (
    "encoding/json"
    "os"
)

type omit bool

type Value interface{}

type CacheItem struct {
    Key    string `json:"key"`
    MaxAge int    `json:"cacheAge"`
    Value  Value  `json:"cacheValue"`
}

func NewCacheItem() (*CacheItem, error) {
    i := &CacheItem{}
    return i, json.Unmarshal([]byte(`{
      "key": "foo",
      "cacheAge": 1234,
      "cacheValue": {
        "nested": true
      }
    }`), i)
}

func main() {
    item, _ := NewCacheItem()

    json.NewEncoder(os.Stdout).Encode(struct {
        *CacheItem

        // Omit bad keys
        OmitMaxAge omit `json:"cacheAge,omitempty"`
        OmitValue  omit `json:"cacheValue,omitempty"`

        // Add nice keys
        MaxAge int    `json:"max_age"`
        Value  *Value `json:"value"`
    }{
        CacheItem: item,

        // Set the int by value:
        MaxAge: item.MaxAge,

        // Set the nested struct by reference, avoid making a copy:
        Value: &item.Value,
    })
}
5
  • I don't think what you want is possible. Iterating through all the items seems to be the only option, which you are already doing during the unmarshaling (though implicitly), going from X to 2X is not too bad, also 8k records every few minutes is not a lot, I would be worried if it was per second. Commented Feb 24, 2018 at 1:13
  • Well it’s unmarshaling than iterating while appending back x 15 go routines all for bulk inserts of about 8k every 5 minutes Commented Feb 24, 2018 at 1:40
  • I think I get your question, but I still think it would be easier to understand if you added some sample array input of the kind you want to be able to process, and if possible some code you've tried to use to process it, even if it's broken or unfinished or doesn't perform the way you want. Commented Feb 24, 2018 at 5:34
  • It sounds like you have a solution in mind but you are concerned about its performance, but you haven't actually tested its performance. I have often heard the advice that performance problems are difficult to predict accurately, so just get it to work, and then worry about performance if performance turns out to be a problem... Commented Feb 24, 2018 at 5:37
  • So your ultimate goal is to output normalized JSON, right? That is, you're not using the JSON as an intermediary step on the way to Go logic, you're using Go as an intermediary step from JSON to JSON -- yes? Commented Feb 24, 2018 at 8:50

1 Answer 1

1

It appears your desired output is JSON. You can accomplish the conversion by unmarshaling into a slice of structs, and then iterating through each of those to convert them to the second struct type (your anonymous struct above), append them into a slice and then marshal the slice back to JSON:

package main

import (
    "fmt"
    "encoding/json"
)

type omit bool

type Value interface{}

type CacheItem struct {
    Key    string `json:"key"`
    MaxAge int    `json:"cacheAge"`
    Value  Value  `json:"cacheValue"`
}

type OutGoing struct {
    // Omit bad keys
    OmitMaxAge omit `json:"cacheAge,omitempty"`
    OmitValue  omit `json:"cacheValue,omitempty"`

    // Add nice keys
    Key    string `json:"key"`
    MaxAge int    `json:"max_age"`
    Value  *Value `json:"value"`
}

func main() {
    objects := make([]CacheItem, 0)
    sample := []byte(`[
    {
      "key": "foo",
      "cacheAge": 1234,
      "cacheValue": {
        "nested": true
      }},
    {
      "key": "baz",
      "cacheAge": 123,
      "cacheValue": {
        "nested": true
    }}]`)

    json.Unmarshal(sample, &objects)

    out := make([]OutGoing, 0, len(objects))
    for _, o := range objects {
        out = append(out, OutGoing{Key:o.Key, MaxAge:o.MaxAge, Value:&o.Value})
    }
    s, _ := json.Marshal(out)
    fmt.Println(string(s))
}

This outputs

[{"key":"foo","max_age":1234,"value":{"nested":true}},{"key":"baz","max_age":123,"value":{"nested":true}}]

You could probably skip this iteration and conversion code if you wrote custom MarshalJSON and UnmarshalJSON methods for your CacheItem type, instead of relying on struct field tags. Then you could pass the same slice to both Unmarshal and Marshal.

To me there's no obvious performance mistake with these approaches -- contrast with building a string in a loop using the + operator -- and when that's the case it's often best to just get the software to work and then test for performance rather than ruling out a solution based on fears of performance issues without actually testing.

If there's a performance problem with the above approaches, and you really want to avoid marshal and unmarshal completely, you could look into byte replacement in the JSON data (e.g. regexp). I'm not recommending this approach, but if your changes are very simple and the inputs are very consistent it could work, and it would give another approach you could performance test, and then you could compare performance test results.

Sign up to request clarification or add additional context in comments.

1 Comment

you get the check as this was exactly how I was handling other normalization issues and preparing for bulk inserts with time stamps. but as expected(missing something easy) it never occurred to me to append to a second struct. while waiting for a someone to figure out how silly i was i came up with a dirty hack method, but I think your solution is the obvious way to deal with this. Thanks for your time and help, and sorry I didin't answer your comments first above , was sleeping :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.