5

Is there an efficient way to remove duplicates 'person_id' fields from this data with python? In this case just keep the first occurrence.

{
  {obj_id: 123,
    location: {
      x: 123,
      y: 323,
  },
  {obj_id: 13,
    location: {
      x: 23,
      y: 333,
  },
 {obj_id: 123,
    location: {
      x: 122,
      y: 133,
  },
}

Should become:

{
  {obj_id: 123,
    location: {
      x: 123,
      y: 323,
  },
  {obj_id: 13,
    location: {
      x: 23,
      y: 333,
  },
}
2
  • 1
    This isn't valid json. Can you post your actual data? Commented Jun 12, 2013 at 22:24
  • 1
    Also, those aren't really duplicates... Commented Jun 12, 2013 at 22:29

4 Answers 4

11

Presuming your JSON is valid syntax and you are indeed requesting help for Python you will need to do something like this

import json
ds = json.loads(json_data_string) #this contains the json
unique_stuff = { each['obj_id'] : each for each in ds }.values()

if you want to always retain the first occurrence, you will need to do something like this

all_ids = [ each['obj_id'] for each in ds ] # get 'ds' from above snippet
unique_stuff = [ ds[ all_ids.index(id) ] for id in set(ids) ]
Sign up to request clarification or add additional context in comments.

5 Comments

Assuming, ds is an array of dictionaries (as that's the option that makes most sense) this works well, but keeps the last occurrence instead of the first one.
+1: elegant (the first example). Though .values()/set() may return objects in any order. Assuming the input is json array then it might matter. Here's order preserving algorithm
micro-nitpick: obj and json_array might be better names than each and ds
whats the difference between json.loads(...) and json.load(...)?
@Sevenearths json.loads(...) loads from string (loads:load string); json.load(...) loads from a file handler (or a read() supporting file-like object)
5

Here's an implementation that preserves order of input json objects and keeps the first occurrence of objects with the same id:

import json
import sys
from collections import OrderedDict

L = json.load(sys.stdin, object_pairs_hook=OrderedDict)
seen = OrderedDict()
for d in L:
    oid = d["obj_id"]
    if oid not in seen:
        seen[oid] = d

json.dump(seen.values(), sys.stdout,  indent=2)

Input

[
  {
    "obj_id": 123, 
    "location": {
      "x": 123, 
      "y": 323
    }
  }, 
  {
    "obj_id": 13, 
    "location": {
      "x": 23, 
      "y": 333
    }
  }, 
  {
    "obj_id": 123, 
    "location": {
      "x": 122, 
      "y": 133
    }
  }
]

Output

[
  {
    "obj_id": 123, 
    "location": {
      "x": 123, 
      "y": 323
    }
  }, 
  {
    "obj_id": 13, 
    "location": {
      "x": 23, 
      "y": 333
    }
  }
]

Comments

-3

(if you had valid json)

from simplejson import loads, dumps
dumps(loads(my_json))

2 Comments

How can you know it will do anything, if you don't know how the correct input looks like?
The question title is "Remove duplicates from json data". I caveat'd "if valid json" and provided an answer. This is what everyone else in the question has also done.
-4

This is not valid JSON. On a valid JSON (Array), You can use jQuery $.each and look at the Obj_id to find and remove duplicates.

Something like this:

$.each(myArrayOfObjects, function(i, v)
{
      // check for duplicate and add non-repeatings to a new array
});

1 Comment

You missed the python tag. Not every things in the world is jQuery.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.