Combining JSON data based on value of specified key

Question

I'm hoping someone can show me a less verbose and more efficient way to achieve the following:

I have some JSON data (via PapaParse) which contains an array of objects. It looks something like this:

const myJSON = [
    {subscriber_id: "1", segment: "something", status: "subscribed", created_at: "2019-01-16 05:55:20"},
    {subscriber_id: "1", segment: "another thing", status: "subscribed", created_at: "2019-04-02 23:06:54"},
    {subscriber_id: "1", segment: "something else", status: "subscribed", created_at: "2019-04-03 03:55:16"}, 
];

My goal is to iterate through the data and merge all objects with the same value for subscriber_id into a single object with all the segment values combined into an array, so that the result will look like this:

[
    {subscriber_id: "1", segment: ["something", "another thing", "something else"], status: "subscribed", created_at: "2019-01-16 05:55:20"}
];

Below is my current code, which works. But I'm interested in ways to improve it.

Note: In my actual project, I allow the user to choose which column is used to identify duplicate rows and which columns to combine, which is why my mergeCSV function takes 3 parameters.

const myJSON = [{
      subscriber_id: "1",
      segment: "something",
      status: "subscribed",
      created_at: "2019-01-16 05:55:20"
    },
    {
      subscriber_id: "1",
      segment: "another thing",
      status: "subscribed",
      created_at: "2019-04-02 23:06:54"
    },
    {
      subscriber_id: "1",
      segment: "something else",
      status: "subscribed",
      created_at: "2019-04-03 03:55:16"
    },
  ],
  myKey = "subscriber_id",
  myColumns = ["segment"];


const mergeCSV = (theData, theKey, theColumns) => {

  const l = theData.length;
  let theOutput = [];

  // add the first row
  theOutput.push(theData[0]);

  // convert columns to be combined into arrays    
  theColumns.forEach(col => theOutput[0][col] = [theOutput[0][col]]);

  // loop through the main file from beginning to end
  for (var a = 1; a < l; a++) {

    // reset duplicate flag
    let duplicate = false;

    // loop through theOutput file from end to beginning
    for (var b = theOutput.length; b > 0; b--) {
      const n = b - 1;

      // for each of the columns which will be combined                        
      for (var i = 0; i < theColumns.length; i++) {

        // if theKey matches
        if (theData[a][theKey] === theOutput[n][theKey]) {

          duplicate = true;

          // add the column data to existing output row
          theOutput[n][theColumns[i]].push(theData[a][theColumns[i]]);
          break;
        }
      }
    }

    // if theKey doesn't match any rows in theOutput
    if (!duplicate) {
      // add the row
      theOutput.push(theData[a]);
      // convert columns to be combined into arrays
      theColumns.forEach(col => theOutput[theOutput.length - 1][col] = [theOutput[theOutput.length - 1][col]]);
    }

  }
  return theOutput;
}

console.log( mergeCSV(myJSON, myKey, myColumns) );

Nina Scholz · Accepted Answer · 2020-03-28 10:55:20Z

1

You could reduce the array by using a hash table.

const
    mergeCSV = (data, key, columns) => Object.values(data.reduce((r, o) => {
        if (!r[o[key]]) r[o[key]] = { ...o, ...Object.fromEntries(columns.map(k => [k, []])) };
        columns.forEach(k => r[o[key]][k].push(o[k]));
        return r;
    }, {})),
    data = [{ subscriber_id: "1", segment: "something", status: "subscribed", created_at: "2019-01-16 05:55:20" }, { subscriber_id: "1", segment: "another thing", status: "subscribed", created_at: "2019-04-02 23:06:54" }, { subscriber_id: "1", segment: "something else", status: "subscribed", created_at: "2019-04-03 03:55:16" }];

console.log( mergeCSV(data, "subscriber_id", ["segment"]));

.as-console-wrapper { max-height: 100% !important; top: 0; }

edited Mar 28, 2020 at 10:55

answered Mar 28, 2020 at 10:45

Nina Scholz

388k26 gold badges367 silver badges417 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

thingEvery Over a year ago

Thank you, Nina! This is exactly what I was looking for. It took me quite some time to figure out what each part of your code is doing, so I definitely learned a lot from this. BTW, when tested on a large file, my code took an average of 7740ms. Yours brought it down to 31ms.

brk · Accepted Answer · 2020-03-28 10:44:44Z

1

You can use array.reduce for more cleaner code

const myJSON = [{
    subscriber_id: "1",
    segment: "something",
    status: "subscribed",
    created_at: "2019-01-16 05:55:20"
  },
  {
    subscriber_id: "1",
    segment: "another thing",
    status: "subscribed",
    created_at: "2019-04-02 23:06:54"
  },
  {
    subscriber_id: "1",
    segment: "something else",
    status: "subscribed",
    created_at: "2019-04-03 03:55:16"
  },
];
// inside reduce callback use findIndex to check if accumulator array
   // contains any object with same `subscriber_id`
let newJSON = myJSON.reduce((acc, curr) => {
  let findIndex = acc.findIndex(item => item.subscriber_id === curr.subscriber_id);
  // if accumulator array does not contain object with subscriber_id then push
  // an new object inside the accumulator
  if (findIndex === -1) {
    acc.push({
      subscriber_id: curr.subscriber_id,
      status: curr.status,
      segment: [curr.segment],
      created_at: curr.created_at
    });
  } else {
   // update the object with same subscriber_id 
    acc[findIndex].segment.push(curr.segment)
  }


  return acc;
}, []);

console.log(newJSON)

answered Mar 28, 2020 at 10:44

brk

50.3k10 gold badges59 silver badges85 bronze badges

4 Comments

xdeepakv Over a year ago

O(n2) complexity

xdeepakv Over a year ago

reduce once, collect one.. check my answer.

Code Maniac Over a year ago

@xdeepakv You can do it O(n) only when you have single column to be merged, where as by the format of OP's code it seems there can be more than one columns to be merged in that case this can't be used.

thingEvery Over a year ago

This code takes a bit longer than some of the other solutions to execute. On my test file, it clocks in at an average of 2548ms. However, that's still much faster than my 7740ms. The only problem is that the keys are hard coded so I'd need to update this every time I'm dealing with different data. Still it's nice to see various ideas on how to accomplish a task, so thank you!

Code Maniac · Accepted Answer · 2020-03-28 10:44:45Z

1

You can use reduce, filter out the keys which are not needed to be merged, get the value for keys which should not be merged from first element, and for keys to be merged get value from each element

const myJSON = [{subscriber_id: "1",segment: "something",status: "subscribed",created_at: "2019-01-16 05:55:20"},{subscriber_id: "1",segment: "another thing",status: "subscribed",created_at: "2019-04-02 23:06:54"},{subscriber_id: "1",segment: "something else",status: "subscribed",created_at: "2019-04-03 03:55:16"}];
let myKey = "subscriber_id";
let myColumns = ["segment"];

const final = myJSON.reduce((op, inp, index) => {
  let key = inp[myKey]
  if (key) {
    let columnsNotToBeMerged = index === 0 && Object.keys(inp).filter(key => !myColumns.includes(key))
    myColumns.forEach(column => {
      op[key] = op[key] || {}
      op[key][column] = op[key][column] || []
      op[key][column].push(inp[column])
    })
    index === 0 && columnsNotToBeMerged.forEach(columnNotMerge => {
      op[key] = op[key] || {}
      if (!op[key][columnNotMerge]) {
        op[key][columnNotMerge] = inp[columnNotMerge]
      }
    })
  }
  return op
}, {})

console.log(Object.values(final))

answered Mar 28, 2020 at 10:44

Code Maniac

37.9k5 gold badges44 silver badges65 bronze badges

1 Comment

thingEvery Over a year ago

The code, as written, has a problem. In the resulting array, every object after the first contains only the keys in myColumns. But it was easily fixed by removing index === 0 &&. After that, the average execution time for my test file was 60ms. Then I moved the columnsNotToBeMerged declaration outside the reduce function and got it down to 42ms. Thank you for sharing this. All the answers were very helpful.

xdeepakv · Accepted Answer · 2020-03-28 10:45:08Z

1

You can use array.reduce, to such a complex problem. Very useful.

First reduce to group, later collect using iterate. Only O(n) complexity

const myJSON = [
  {
    subscriber_id: "1",
    segment: "something",
    status: "subscribed",
    created_at: "2019-01-16 05:55:20"
  },
  {
    subscriber_id: "1",
    segment: "another thing",
    status: "subscribed",
    created_at: "2019-04-02 23:06:54"
  },
  {
    subscriber_id: "1",
    segment: "something else",
    status: "subscribed",
    created_at: "2019-04-03 03:55:16"
  }
];

const groupBy = (arr, fn) =>
  arr.reduce((acc, item, i) => {
    const val = fn(item);
    if (!acc[val]) acc[val] = { ...item, segment: [item.segment] };
    else {
      acc[val].segment.push(item.segment);
    }
    return acc;
  }, {});
const map = groupBy(myJSON, x => x.subscriber_id);

// collect now
let result = [];
for (let i in map) {
  result.push(map[i]);
}
console.log(result);

answered Mar 28, 2020 at 10:45

xdeepakv

8,1452 gold badges25 silver badges35 bronze badges

1 Comment

thingEvery Over a year ago

This works with the sample data, but the user can no longer choose which columns to merge, etc. However, I was able to adapt the code and add the extra functionality I needed. So I really appreciate this answer because it forced me to learn about time complexity and review some ES6 features. After adapting the code and testing with a large file, the average execution time was 37ms as opposed to my 7740.

Nenad Vracar · Accepted Answer · 2020-03-28 10:47:48Z

1

You could use reduce method and inside loop Object.entries of current object and check if the key is included in keys param to push to an array or to just assign property value.

const myJSON = [
    {subscriber_id: "1", segment: "something", status: "subscribed", created_at: "2019-01-16 05:55:20"},
    {subscriber_id: "1", segment: "another thing", status: "subscribed", created_at: "2019-04-02 23:06:54"},
    {subscriber_id: "1", segment: "something else", status: "subscribed", created_at: "2019-04-03 03:55:16"}, 
];

const myKey = "subscriber_id";
const myColumns = ["segment"];

const mergeCSV = (data, key, columns) => {
  const obj = data.reduce((r, e) => {
    if (!r[e[key]]) r[e[key]] = {}

    Object.entries(e).forEach(([k, v]) => {
      if (columns.includes(k)) r[e[key]][k] = (r[e[key]][k] || []).concat(v)
      else r[e[key]][k] = v
    })

    return r;
  }, {})

  return Object.values(obj)
}

const result = mergeCSV(myJSON, myKey, myColumns)
console.log(result)

answered Mar 28, 2020 at 10:47

Nenad Vracar

122k16 gold badges160 silver badges184 bronze badges

3 Comments

Code Maniac Over a year ago

On side note:- For keys those need not to be merged, it needs to pick the values from first element,

thingEvery Over a year ago

@CodeManiac Good point. While I didn't specifically state that in my question, that would be preferable.

thingEvery Over a year ago

Thanks for this answer. Very concise! With my large file, it took an average of 74 ms to execute. I then modified it to keep the first value for keys not specified in myColumns, and somehow I got the average time down to 47 ms. All of these answers have helped me a lot.

Collectives™ on Stack Overflow

Combining JSON data based on value of specified key

5 Answers 5

1 Comment

4 Comments

1 Comment

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

4 Comments

1 Comment

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related