Load part of a json in python

Question

I have a json file with about a 1000 data entries. For example

{"1":"Action","2":"Adventure",....."1000":"Mystery"}

The above is just a example.

I am using the json.load feature by importing json. How do I load only the first 10 data entries from the json.

{"1":"Action","2":"Adventure",....."10":"Thriller"}

JSON is an object representation that resembles a dictionary. There isn't an order. — Malik Brahimi
– Malik Brahimi, Commented Sep 18, 2015 at 21:58
You could write a function that get you 10 keys from your dictionary at a time and you can just look up the corresponding values using the .get() method — letsc
– letsc, Commented Sep 18, 2015 at 22:03
@MalikBrahimi: well, we don't know whether he wants the first 10 in the file, or the 10 smallest keys... — Steve Jessop
– Steve Jessop, Commented Sep 18, 2015 at 22:10
@MalikBrahimi: I think I do understand. JSON is a text file format, there really are a first 10 in the file. Whether it's wise to care what order they appear is a separate matter from whether the questioner actually does care what order they appear. But for example, the fact that json.dump has a sort_keys parameter is because sometimes people do care about the order of keys in the file. It's just highly unusual for this caring to extend so far as loading the file, usually it's only for human-readability. — Steve Jessop
– Steve Jessop, Commented Sep 18, 2015 at 22:13
I found it absolutely clear that OP just wants to iterate through the file because of its size instead of loading the JSON completely into memory and then decode it. In this context the order he expects is clear: the one from the JSON file—plain and simple. — Alfe
– Alfe, Commented Sep 18, 2015 at 22:29

Martijn Pieters · Accepted Answer · 2015-09-18 22:22:02Z

8

JSON objects, like Python dictionaries, have no order. You can also not control how much of an object is loaded, not with the standard library json module at any rate.

After loading, you could take the ten key-value pairs with the lowest key value:

import heapq
import json

data = json.loads(json_string)
limited = {k: data[k] for k in heapq.nsmallest(data, 10, key=int)}

The heapq.nsmallest() will efficiently pick out the 10 smallest keys regardless of the size of data.

Of course, if the keys are always consecutive and always start at 1, you may as well use a range() here:

data = json.loads(json_string)
limited = {str(k): data[str(k)] for k in range(1, 11)}

If you want to capture the objects in file definition order you could use the object_pairs_hook argument to json.load() and json.loads():

class FirstTenDict(dict):
    def __init__(self, pairs):
        super(FirstTenDict, self).__init__(pairs[:10])

data = json.loads(json_string, object_pairs_hook=FirstTenDict)

Demo of the latter approach:

>>> import json
>>> class FirstTenDict(dict):
...     def __init__(self, pairs):
...         super(FirstTenDict, self).__init__(pairs[:10])
... 
>>> json_data = '''\
... {"foo42": "bar", "foo31": "baz", "foo10": "spam", "foo44": "ham", "foo1": "eggs",
...  "foo24": "vikings", "foo21": "monty", "foo88": "python", "foo11": "eric", "foo65": "idle",
...  "foo13": "will", "foo31": "be", "foo76": "ignored"}
... '''
>>> json.loads(json_data)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo76': 'ignored', 'foo42': 'bar', 'foo24': 'vikings', 'foo11': 'eric', 'foo31': 'be', 'foo13': 'will', 'foo21': 'monty', 'foo65': 'idle'}
>>> json.loads(json_data, object_pairs_hook=FirstTenDict)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo24': 'vikings', 'foo11': 'eric', 'foo21': 'monty', 'foo42': 'bar', 'foo31': 'baz', 'foo65': 'idle'}
>>> import pprint
>>> pprint.pprint(_)
{'foo1': 'eggs',
 'foo10': 'spam',
 'foo11': 'eric',
 'foo21': 'monty',
 'foo24': 'vikings',
 'foo31': 'baz',
 'foo42': 'bar',
 'foo44': 'ham',
 'foo65': 'idle',
 'foo88': 'python'}

edited Sep 18, 2015 at 22:22

answered Sep 18, 2015 at 22:02

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Malik Brahimi Over a year ago

I think the phrasing in the OP could be fixed. Not the first ten elements but the elements with the keys '1', '2', '3', '4', ... , '10'

Martijn Pieters Over a year ago

@MalikBrahimi: there is no '0' key. I see no indication whatsover that they want to limit themselves to the keys from 1 through to 10. They want the first 10 keys, where we are all making the assumption they mean the first 10 in numeric order.

Malik Brahimi Over a year ago

@MartijnPieters My bad. Again you know what I meant.

cann0nextr3me Over a year ago

@MalikBrahimi I want the first 10 keys, regardless of the numbering. {"1000":"Action","999":"Adventure",....."991":"Thriller"}

Malik Brahimi Over a year ago

Again, unless you parse it yourself, loading as JSON will not preserve an order.

|

Steve Jessop · Accepted Answer · 2015-09-18 22:37:19Z

You can iteratively parse json (that is to say, not "all at once") using ijson, and assuming your input really is as simple as your example:

import ijson

def iter_items(parser):
    for prefix, event, value in parser:
        if event == 'string':
            yield prefix, value

with open('filename.json') as infile:
    items = iter_items(ijson.parser(infile))
    # choose one of the following
    # first 10 items from the file regardless of keys
    print dict(itertools.islice(items, 10))
    # least 10 keys when considered as integers
    print dict(heapq.nsmallest(items, 10, lambda p: int(p[0])))

Obviously the second of these would still have to read the whole file, it just doesn't have to keep the whole file in memory at once. Avoiding that is premature optimization for only 1000 small key-value pairs, but whatever. I found the question interesting enough to use a library I've never considered before because maybe sometimes json files are huge, and because of the close analogy with SAX parsers (which are event-based streaming parsers for XML).

By the way, if order was important then the producer of this JSON probably should put an array in the JSON. But perhaps as consumer you can't do anything about that.

DevLounge · Accepted Answer · 2015-09-18 22:07:37Z

1

file = 'data.json'
with open(file, 'rb') as f:
    content = json.load(file)

what_you_want = {int(k):v for k,v in content.items() if int(k) in range(1, 11)}

I don't think there any other way. You must load the entire thing and only then you can extract the keys you want.

edited Sep 18, 2015 at 22:07

answered Sep 18, 2015 at 22:01

DevLounge

8,4633 gold badges33 silver badges44 bronze badges

3 Comments

Navith Over a year ago

json.load expects a file object -- not a path to one. Call open on file before passing it to json.load.

Martijn Pieters Over a year ago

Why load all items when you know the keys up front? And JSON keys are always strings, so this code doesn't actually work for JSON data.

DevLounge Over a year ago

indeed, you dict comprehension is more efficient for sure. Didn't think of it ;-)

Leb · Accepted Answer · 2015-09-18 22:02:45Z

0

In short, you can't.

While each entry is a JSON entry, the file as a whole is a valid JSON file.

For example:

"1":"Action" is proper JSON format, but you cannot load it on its own.

In order to be able to import it as a JSON format, you'll need the full syntax of it {"1":"Action"}

What you'll need to do is still load the whole file, then assign first 10 lines to a variable.

answered Sep 18, 2015 at 22:02

Leb

16k11 gold badges58 silver badges77 bronze badges

Comments

cg909 · Accepted Answer · 2015-09-18 22:39:23Z

You have two options:

If you use Python >= 3.1 you can use

from collections import OrderedDict
decoder = json.JSONDecoder(object_pairs_hook=OrderedDict)
data = decoder.decode(datastring)

This will decode the whole file, but keep all key-value pairs in the same order as they were in the file.

Then you can slice the first n items with something like

result = OrderedDict((k,v) for (k,v),i in zip(data.items(), range(n)))

This isn't efficient, but you will get the first 10 entries, as they were written in the JSON.

The second option and the more efficient but harder one is using an iterative JSON parser like ijson as @steve-jessop mentioned.

If and only if your JSON files are always flat (don't contain any subobjects or lists), as your example in the question, the following code will put the first 10 elements into result. More complex files need more complex parser code.

import ijson
result = {}
for prefix, event, value in ijson.parse(file):
  if event == 'map_key':
    if len(result) > 10:
      break
  if prefix:
    result[prefix] = value

Collectives™ on Stack Overflow

Load part of a json in python

5 Answers 5

9 Comments

Comments

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

9 Comments

Comments

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related