1

I'm hoping someone can show me a less verbose and more efficient way to achieve the following:


I have some JSON data (via PapaParse) which contains an array of objects. It looks something like this:

const myJSON = [
    {subscriber_id: "1", segment: "something", status: "subscribed", created_at: "2019-01-16 05:55:20"},
    {subscriber_id: "1", segment: "another thing", status: "subscribed", created_at: "2019-04-02 23:06:54"},
    {subscriber_id: "1", segment: "something else", status: "subscribed", created_at: "2019-04-03 03:55:16"}, 
];

My goal is to iterate through the data and merge all objects with the same value for subscriber_id into a single object with all the segment values combined into an array, so that the result will look like this:

[
    {subscriber_id: "1", segment: ["something", "another thing", "something else"], status: "subscribed", created_at: "2019-01-16 05:55:20"}
];

Below is my current code, which works. But I'm interested in ways to improve it.

Note: In my actual project, I allow the user to choose which column is used to identify duplicate rows and which columns to combine, which is why my mergeCSV function takes 3 parameters.

const myJSON = [{
      subscriber_id: "1",
      segment: "something",
      status: "subscribed",
      created_at: "2019-01-16 05:55:20"
    },
    {
      subscriber_id: "1",
      segment: "another thing",
      status: "subscribed",
      created_at: "2019-04-02 23:06:54"
    },
    {
      subscriber_id: "1",
      segment: "something else",
      status: "subscribed",
      created_at: "2019-04-03 03:55:16"
    },
  ],
  myKey = "subscriber_id",
  myColumns = ["segment"];


const mergeCSV = (theData, theKey, theColumns) => {

  const l = theData.length;
  let theOutput = [];

  // add the first row
  theOutput.push(theData[0]);

  // convert columns to be combined into arrays    
  theColumns.forEach(col => theOutput[0][col] = [theOutput[0][col]]);

  // loop through the main file from beginning to end
  for (var a = 1; a < l; a++) {

    // reset duplicate flag
    let duplicate = false;

    // loop through theOutput file from end to beginning
    for (var b = theOutput.length; b > 0; b--) {
      const n = b - 1;

      // for each of the columns which will be combined                        
      for (var i = 0; i < theColumns.length; i++) {

        // if theKey matches
        if (theData[a][theKey] === theOutput[n][theKey]) {

          duplicate = true;

          // add the column data to existing output row
          theOutput[n][theColumns[i]].push(theData[a][theColumns[i]]);
          break;
        }
      }
    }

    // if theKey doesn't match any rows in theOutput
    if (!duplicate) {
      // add the row
      theOutput.push(theData[a]);
      // convert columns to be combined into arrays
      theColumns.forEach(col => theOutput[theOutput.length - 1][col] = [theOutput[theOutput.length - 1][col]]);
    }

  }
  return theOutput;
}

console.log( mergeCSV(myJSON, myKey, myColumns) );

5 Answers 5

1

You could reduce the array by using a hash table.

const
    mergeCSV = (data, key, columns) => Object.values(data.reduce((r, o) => {
        if (!r[o[key]]) r[o[key]] = { ...o, ...Object.fromEntries(columns.map(k => [k, []])) };
        columns.forEach(k => r[o[key]][k].push(o[k]));
        return r;
    }, {})),
    data = [{ subscriber_id: "1", segment: "something", status: "subscribed", created_at: "2019-01-16 05:55:20" }, { subscriber_id: "1", segment: "another thing", status: "subscribed", created_at: "2019-04-02 23:06:54" }, { subscriber_id: "1", segment: "something else", status: "subscribed", created_at: "2019-04-03 03:55:16" }];

console.log( mergeCSV(data, "subscriber_id", ["segment"]));
.as-console-wrapper { max-height: 100% !important; top: 0; }

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, Nina! This is exactly what I was looking for. It took me quite some time to figure out what each part of your code is doing, so I definitely learned a lot from this. BTW, when tested on a large file, my code took an average of 7740ms. Yours brought it down to 31ms.
1

You can use array.reduce for more cleaner code

const myJSON = [{
    subscriber_id: "1",
    segment: "something",
    status: "subscribed",
    created_at: "2019-01-16 05:55:20"
  },
  {
    subscriber_id: "1",
    segment: "another thing",
    status: "subscribed",
    created_at: "2019-04-02 23:06:54"
  },
  {
    subscriber_id: "1",
    segment: "something else",
    status: "subscribed",
    created_at: "2019-04-03 03:55:16"
  },
];
// inside reduce callback use findIndex to check if accumulator array
   // contains any object with same `subscriber_id`
let newJSON = myJSON.reduce((acc, curr) => {
  let findIndex = acc.findIndex(item => item.subscriber_id === curr.subscriber_id);
  // if accumulator array does not contain object with subscriber_id then push
  // an new object inside the accumulator
  if (findIndex === -1) {
    acc.push({
      subscriber_id: curr.subscriber_id,
      status: curr.status,
      segment: [curr.segment],
      created_at: curr.created_at
    });
  } else {
   // update the object with same subscriber_id 
    acc[findIndex].segment.push(curr.segment)
  }


  return acc;
}, []);

console.log(newJSON)

4 Comments

O(n2) complexity
reduce once, collect one.. check my answer.
@xdeepakv You can do it O(n) only when you have single column to be merged, where as by the format of OP's code it seems there can be more than one columns to be merged in that case this can't be used.
This code takes a bit longer than some of the other solutions to execute. On my test file, it clocks in at an average of 2548ms. However, that's still much faster than my 7740ms. The only problem is that the keys are hard coded so I'd need to update this every time I'm dealing with different data. Still it's nice to see various ideas on how to accomplish a task, so thank you!
1

You can use reduce, filter out the keys which are not needed to be merged, get the value for keys which should not be merged from first element, and for keys to be merged get value from each element

const myJSON = [{subscriber_id: "1",segment: "something",status: "subscribed",created_at: "2019-01-16 05:55:20"},{subscriber_id: "1",segment: "another thing",status: "subscribed",created_at: "2019-04-02 23:06:54"},{subscriber_id: "1",segment: "something else",status: "subscribed",created_at: "2019-04-03 03:55:16"}];
let myKey = "subscriber_id";
let myColumns = ["segment"];

const final = myJSON.reduce((op, inp, index) => {
  let key = inp[myKey]
  if (key) {
    let columnsNotToBeMerged = index === 0 && Object.keys(inp).filter(key => !myColumns.includes(key))
    myColumns.forEach(column => {
      op[key] = op[key] || {}
      op[key][column] = op[key][column] || []
      op[key][column].push(inp[column])
    })
    index === 0 && columnsNotToBeMerged.forEach(columnNotMerge => {
      op[key] = op[key] || {}
      if (!op[key][columnNotMerge]) {
        op[key][columnNotMerge] = inp[columnNotMerge]
      }
    })
  }
  return op
}, {})

console.log(Object.values(final))

1 Comment

The code, as written, has a problem. In the resulting array, every object after the first contains only the keys in myColumns. But it was easily fixed by removing index === 0 &&. After that, the average execution time for my test file was 60ms. Then I moved the columnsNotToBeMerged declaration outside the reduce function and got it down to 42ms. Thank you for sharing this. All the answers were very helpful.
1

You can use array.reduce, to such a complex problem. Very useful.

First reduce to group, later collect using iterate. Only O(n) complexity

const myJSON = [
  {
    subscriber_id: "1",
    segment: "something",
    status: "subscribed",
    created_at: "2019-01-16 05:55:20"
  },
  {
    subscriber_id: "1",
    segment: "another thing",
    status: "subscribed",
    created_at: "2019-04-02 23:06:54"
  },
  {
    subscriber_id: "1",
    segment: "something else",
    status: "subscribed",
    created_at: "2019-04-03 03:55:16"
  }
];

const groupBy = (arr, fn) =>
  arr.reduce((acc, item, i) => {
    const val = fn(item);
    if (!acc[val]) acc[val] = { ...item, segment: [item.segment] };
    else {
      acc[val].segment.push(item.segment);
    }
    return acc;
  }, {});
const map = groupBy(myJSON, x => x.subscriber_id);

// collect now
let result = [];
for (let i in map) {
  result.push(map[i]);
}
console.log(result);

1 Comment

This works with the sample data, but the user can no longer choose which columns to merge, etc. However, I was able to adapt the code and add the extra functionality I needed. So I really appreciate this answer because it forced me to learn about time complexity and review some ES6 features. After adapting the code and testing with a large file, the average execution time was 37ms as opposed to my 7740.
1

You could use reduce method and inside loop Object.entries of current object and check if the key is included in keys param to push to an array or to just assign property value.

const myJSON = [
    {subscriber_id: "1", segment: "something", status: "subscribed", created_at: "2019-01-16 05:55:20"},
    {subscriber_id: "1", segment: "another thing", status: "subscribed", created_at: "2019-04-02 23:06:54"},
    {subscriber_id: "1", segment: "something else", status: "subscribed", created_at: "2019-04-03 03:55:16"}, 
];

const myKey = "subscriber_id";
const myColumns = ["segment"];

const mergeCSV = (data, key, columns) => {
  const obj = data.reduce((r, e) => {
    if (!r[e[key]]) r[e[key]] = {}

    Object.entries(e).forEach(([k, v]) => {
      if (columns.includes(k)) r[e[key]][k] = (r[e[key]][k] || []).concat(v)
      else r[e[key]][k] = v
    })

    return r;
  }, {})

  return Object.values(obj)
}

const result = mergeCSV(myJSON, myKey, myColumns)
console.log(result)

3 Comments

On side note:- For keys those need not to be merged, it needs to pick the values from first element,
@CodeManiac Good point. While I didn't specifically state that in my question, that would be preferable.
Thanks for this answer. Very concise! With my large file, it took an average of 74 ms to execute. I then modified it to keep the first value for keys not specified in myColumns, and somehow I got the average time down to 47 ms. All of these answers have helped me a lot.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.