3

I need to get the main keys (devices) from a JSON formatted text with around 70.000 (sub-)keys/objects It looks like this:

{
   "1":{...........}
   "4":{...........}
   "9":{...........}
}

And I need to get "1","4" and "9". But the way I do it now it takes around 2 minutes to parse the text with

json = json.loads(response.text) #this takes so long!
devices = json.keys()

because i'm running this on a Raspberry Pi!

Is there a better way?

EDIT: I recieve the data from a JSON API running on a server with:

http://.../ZWaveAPI/Run/devices #this is an array

EDIT3:

final working code: (runs for 2-5 seconds! :)

import ijson.backends.python as ijson
import urllib

parser = ijson.parse(urllib.urlopen("http://.../ZWaveAPI/Run/devices"))
list = []
for prefix,event,value in parser:
    if event == "map_key" and len(prefix) == 0:
        list.append(value)
return list
2
  • use a database and only query what you need when you need it? Commented Apr 4, 2013 at 20:22
  • I can't change the data I get... I recieve a text with many keys and I need to get the main keys... Or is there a possibility in the way I get the data? ( see Edit) Commented Apr 4, 2013 at 20:48

2 Answers 2

5

You can do it with an stream-oriented iterative JSON parser, but you'll need to install it separately. Try out ijson, it'll emit events for each JSON structure encountered:

for prefix, event, value in parser:
    if event == 'map_key':
        print value
Sign up to request clarification or add additional context in comments.

10 Comments

But this will just call events when it parses, it is not faster, or is it?
It will give you access to intermediate results faster since you'll get them before the whole thing is loaded. It will also use a lot less memory since you won't be building up a giant data structure filled with things you aren't going to use. So it should be at least somewhat faster.
@TeNNoX: you have to scan over the intermediary results anyway to get to the keys you are interested in. But with a streaming parser you don't need to create python objects for the whole data set, which speeds things up.
Okay then I will try that. But events aren't really necessary because I need to wait for everything to finish anyways, right?
@TeNNoX: It needs a file-like object; pass in the result of urlopen() without calling .read() yourself.
|
0

Have you tried to experiment with getting just a single device? With most RESTful web services, if you see an URL like this:

"h ttp://.../ZWaveAPI/Run/devices"

Chances are, you GET individual device by:

"h ttp://.../ZWaveAPI/Run/devices/1"

If it works, it should greatly reduce the amount of data you have to download and parse.

1 Comment

Yeah, but I need a valid list of all files. I can't try out all numbers... By the way I reduced the time now to 3 seconds with ijson as seen in EDIT3

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.