9

I have a json file with about a 1000 data entries. For example

{"1":"Action","2":"Adventure",....."1000":"Mystery"}

The above is just a example.

I am using the json.load feature by importing json. How do I load only the first 10 data entries from the json.

{"1":"Action","2":"Adventure",....."10":"Thriller"}
8
  • 8
    JSON is an object representation that resembles a dictionary. There isn't an order. Commented Sep 18, 2015 at 21:58
  • You could write a function that get you 10 keys from your dictionary at a time and you can just look up the corresponding values using the .get() method Commented Sep 18, 2015 at 22:03
  • 1
    @MalikBrahimi: well, we don't know whether he wants the first 10 in the file, or the 10 smallest keys... Commented Sep 18, 2015 at 22:10
  • 1
    @MalikBrahimi: I think I do understand. JSON is a text file format, there really are a first 10 in the file. Whether it's wise to care what order they appear is a separate matter from whether the questioner actually does care what order they appear. But for example, the fact that json.dump has a sort_keys parameter is because sometimes people do care about the order of keys in the file. It's just highly unusual for this caring to extend so far as loading the file, usually it's only for human-readability. Commented Sep 18, 2015 at 22:13
  • 1
    I found it absolutely clear that OP just wants to iterate through the file because of its size instead of loading the JSON completely into memory and then decode it. In this context the order he expects is clear: the one from the JSON file—plain and simple. Commented Sep 18, 2015 at 22:29

5 Answers 5

8

JSON objects, like Python dictionaries, have no order. You can also not control how much of an object is loaded, not with the standard library json module at any rate.

After loading, you could take the ten key-value pairs with the lowest key value:

import heapq
import json

data = json.loads(json_string)
limited = {k: data[k] for k in heapq.nsmallest(data, 10, key=int)}

The heapq.nsmallest() will efficiently pick out the 10 smallest keys regardless of the size of data.

Of course, if the keys are always consecutive and always start at 1, you may as well use a range() here:

data = json.loads(json_string)
limited = {str(k): data[str(k)] for k in range(1, 11)}

If you want to capture the objects in file definition order you could use the object_pairs_hook argument to json.load() and json.loads():

class FirstTenDict(dict):
    def __init__(self, pairs):
        super(FirstTenDict, self).__init__(pairs[:10])

data = json.loads(json_string, object_pairs_hook=FirstTenDict)

Demo of the latter approach:

>>> import json
>>> class FirstTenDict(dict):
...     def __init__(self, pairs):
...         super(FirstTenDict, self).__init__(pairs[:10])
... 
>>> json_data = '''\
... {"foo42": "bar", "foo31": "baz", "foo10": "spam", "foo44": "ham", "foo1": "eggs",
...  "foo24": "vikings", "foo21": "monty", "foo88": "python", "foo11": "eric", "foo65": "idle",
...  "foo13": "will", "foo31": "be", "foo76": "ignored"}
... '''
>>> json.loads(json_data)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo76': 'ignored', 'foo42': 'bar', 'foo24': 'vikings', 'foo11': 'eric', 'foo31': 'be', 'foo13': 'will', 'foo21': 'monty', 'foo65': 'idle'}
>>> json.loads(json_data, object_pairs_hook=FirstTenDict)
{'foo1': 'eggs', 'foo88': 'python', 'foo44': 'ham', 'foo10': 'spam', 'foo24': 'vikings', 'foo11': 'eric', 'foo21': 'monty', 'foo42': 'bar', 'foo31': 'baz', 'foo65': 'idle'}
>>> import pprint
>>> pprint.pprint(_)
{'foo1': 'eggs',
 'foo10': 'spam',
 'foo11': 'eric',
 'foo21': 'monty',
 'foo24': 'vikings',
 'foo31': 'baz',
 'foo42': 'bar',
 'foo44': 'ham',
 'foo65': 'idle',
 'foo88': 'python'}
Sign up to request clarification or add additional context in comments.

9 Comments

I think the phrasing in the OP could be fixed. Not the first ten elements but the elements with the keys '1', '2', '3', '4', ... , '10'
@MalikBrahimi: there is no '0' key. I see no indication whatsover that they want to limit themselves to the keys from 1 through to 10. They want the first 10 keys, where we are all making the assumption they mean the first 10 in numeric order.
@MartijnPieters My bad. Again you know what I meant.
@MalikBrahimi I want the first 10 keys, regardless of the numbering. {"1000":"Action","999":"Adventure",....."991":"Thriller"}
Again, unless you parse it yourself, loading as JSON will not preserve an order.
|
8

You can iteratively parse json (that is to say, not "all at once") using ijson, and assuming your input really is as simple as your example:

import ijson

def iter_items(parser):
    for prefix, event, value in parser:
        if event == 'string':
            yield prefix, value

with open('filename.json') as infile:
    items = iter_items(ijson.parser(infile))
    # choose one of the following
    # first 10 items from the file regardless of keys
    print dict(itertools.islice(items, 10))
    # least 10 keys when considered as integers
    print dict(heapq.nsmallest(items, 10, lambda p: int(p[0])))

Obviously the second of these would still have to read the whole file, it just doesn't have to keep the whole file in memory at once. Avoiding that is premature optimization for only 1000 small key-value pairs, but whatever. I found the question interesting enough to use a library I've never considered before because maybe sometimes json files are huge, and because of the close analogy with SAX parsers (which are event-based streaming parsers for XML).

By the way, if order was important then the producer of this JSON probably should put an array in the JSON. But perhaps as consumer you can't do anything about that.

Comments

1
file = 'data.json'
with open(file, 'rb') as f:
    content = json.load(file)

what_you_want = {int(k):v for k,v in content.items() if int(k) in range(1, 11)}

I don't think there any other way. You must load the entire thing and only then you can extract the keys you want.

3 Comments

json.load expects a file object -- not a path to one. Call open on file before passing it to json.load.
Why load all items when you know the keys up front? And JSON keys are always strings, so this code doesn't actually work for JSON data.
indeed, you dict comprehension is more efficient for sure. Didn't think of it ;-)
0

In short, you can't.

While each entry is a JSON entry, the file as a whole is a valid JSON file.

For example:

"1":"Action" is proper JSON format, but you cannot load it on its own.

In order to be able to import it as a JSON format, you'll need the full syntax of it {"1":"Action"}

What you'll need to do is still load the whole file, then assign first 10 lines to a variable.

Comments

0

You have two options:

If you use Python >= 3.1 you can use

from collections import OrderedDict
decoder = json.JSONDecoder(object_pairs_hook=OrderedDict)
data = decoder.decode(datastring)

This will decode the whole file, but keep all key-value pairs in the same order as they were in the file.

Then you can slice the first n items with something like

result = OrderedDict((k,v) for (k,v),i in zip(data.items(), range(n)))

This isn't efficient, but you will get the first 10 entries, as they were written in the JSON.

The second option and the more efficient but harder one is using an iterative JSON parser like ijson as @steve-jessop mentioned.

If and only if your JSON files are always flat (don't contain any subobjects or lists), as your example in the question, the following code will put the first 10 elements into result. More complex files need more complex parser code.

import ijson
result = {}
for prefix, event, value in ijson.parse(file):
  if event == 'map_key':
    if len(result) > 10:
      break
  if prefix:
    result[prefix] = value

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.