More efficient way to create JSON from Python

Question

I'd like to write an API that reads from a CSV on disk (with x, y coordinates) and outputs them in JSON format to be rendered by a web front end. The issue is that there are lots of data points (order of 30k) and so going from numpy arrays of x and y into JSON is really slow.

This is my current function to get the data in JSON format. Is there any way to speed this up? It seems very redundant to have such a large data structure for each 2d point.

def to_json(xdata, ydata):
    data = []
    for x, y in zip(xdata, ydata):
        data.append({"x": x, "y": y})
    return data

How does the web front end render? Dicts are kinda bulky but you could pass two lists json.dumps([xdata.tolist(), ydata.tolist()]). Whatever plots likely wants two lists anyway. — tdelaney
– tdelaney, Commented Oct 4, 2016 at 1:18
I'm using nvd3. I can't find any info about alternate data formats it supports but if it did that would make things a lot easier. stackoverflow.com/questions/23643487/… — Nate
– Nate, Commented Oct 4, 2016 at 1:24
I guess you could convert the list to a dict on the client javascript side. You'd save some data payload but in a world of streaming media, maybe it isn't worthwhile. (if nvd3 is okay with the dict, maybe you could consider them the experts!) — tdelaney
– tdelaney, Commented Oct 4, 2016 at 1:30
Have you done any profiling of your code with cProfile to determine what's actually taking the most time? You can try speeding it up in multiple ways but any improve that isn't addressing the bottleneck isn't going to help much. My would be that the speed limit here is the disk and your best bet to improve performance would be to read and write in parallel But again profile before you do anything determine what the bottleneck is or you are likely eating your time. — Nath
– Nath, Commented Oct 4, 2016 at 7:52

score 1 · Accepted Answer · 2016-10-04 01:43:38Z

1

You could use list comprehension like:

def to_json(xdata, ydata):
    return  [{"x": x, "y": y} for x, y in zip(xdata, ydata)]

Eliminates use of unnessacary variable, and is cleaner.

You can also use generators like:

def to_json(xdata, ydata):
    return  ({"x": x, "y": y} for x, y in zip(xdata, ydata))

They're created super fast and are light on the system, use little to no memory. This last's until you do something like convert it to a list.

Since the objects are just x-y co-ordinates i'd use a generator object with x-y tuples - which are also created faster - like so:

def to_json(xdata, ydata):
    return  ((x,y) for x, y in zip(xdata, ydata))

Edit: You could replace the tuples with lists [], theyre valid JSON arrays.

edited Oct 4, 2016 at 1:43

answered Oct 4, 2016 at 1:12

user6765872

Sign up to request clarification or add additional context in comments.

4 Comments

Nate Over a year ago

that last one won't work in this case, the data needs to be in the specific format I used. As for the generators, does that actually provide speed up? I'm still iterating the entire sequence

user6765872 Over a year ago

Generators are basically objects waiting to be created, they are super light and are great for handling very large amounts of data, I'd advise you to read up on it - I'm no expert.

user6765872 Over a year ago

The first function will do fine. Unless you're using a low level server where you can construct the JSON type response, the reuslt will probraly be sent as a string anyway.

user6765872 Over a year ago

you could also replace the tuples with lists [], theyre valid JSON arrays

reticentroot · Accepted Answer · 2016-10-04 01:23:17Z

0

Your method seems reasonable enough. Here are a few changes I might make to it. The itertools module has lots of handy tools that can make your life easier. I used izip, which you can read up on here

import json
from itertools import izip

def to_json(xdata, ydata):
  data = []
  for x, y in izip(xdata, ydata): # using izip is more memory efficient
    data.append({"x": x, "y": y})
  return json.dumps(data) # convert that list into json

answered Oct 4, 2016 at 1:23

reticentroot

3,6922 gold badges25 silver badges42 bronze badges

Collectives™ on Stack Overflow

More efficient way to create JSON from Python

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related