1

I'd like to write an API that reads from a CSV on disk (with x, y coordinates) and outputs them in JSON format to be rendered by a web front end. The issue is that there are lots of data points (order of 30k) and so going from numpy arrays of x and y into JSON is really slow.

This is my current function to get the data in JSON format. Is there any way to speed this up? It seems very redundant to have such a large data structure for each 2d point.

def to_json(xdata, ydata):
    data = []
    for x, y in zip(xdata, ydata):
        data.append({"x": x, "y": y})
    return data
4
  • How does the web front end render? Dicts are kinda bulky but you could pass two lists json.dumps([xdata.tolist(), ydata.tolist()]). Whatever plots likely wants two lists anyway. Commented Oct 4, 2016 at 1:18
  • I'm using nvd3. I can't find any info about alternate data formats it supports but if it did that would make things a lot easier. stackoverflow.com/questions/23643487/… Commented Oct 4, 2016 at 1:24
  • I guess you could convert the list to a dict on the client javascript side. You'd save some data payload but in a world of streaming media, maybe it isn't worthwhile. (if nvd3 is okay with the dict, maybe you could consider them the experts!) Commented Oct 4, 2016 at 1:30
  • Have you done any profiling of your code with cProfile to determine what's actually taking the most time? You can try speeding it up in multiple ways but any improve that isn't addressing the bottleneck isn't going to help much. My would be that the speed limit here is the disk and your best bet to improve performance would be to read and write in parallel But again profile before you do anything determine what the bottleneck is or you are likely eating your time. Commented Oct 4, 2016 at 7:52

2 Answers 2

1

You could use list comprehension like:

def to_json(xdata, ydata):
    return  [{"x": x, "y": y} for x, y in zip(xdata, ydata)]

Eliminates use of unnessacary variable, and is cleaner.

You can also use generators like:

def to_json(xdata, ydata):
    return  ({"x": x, "y": y} for x, y in zip(xdata, ydata))

They're created super fast and are light on the system, use little to no memory. This last's until you do something like convert it to a list.

Since the objects are just x-y co-ordinates i'd use a generator object with x-y tuples - which are also created faster - like so:

def to_json(xdata, ydata):
    return  ((x,y) for x, y in zip(xdata, ydata))

Edit: You could replace the tuples with lists [], theyre valid JSON arrays.

Sign up to request clarification or add additional context in comments.

4 Comments

that last one won't work in this case, the data needs to be in the specific format I used. As for the generators, does that actually provide speed up? I'm still iterating the entire sequence
Generators are basically objects waiting to be created, they are super light and are great for handling very large amounts of data, I'd advise you to read up on it - I'm no expert.
The first function will do fine. Unless you're using a low level server where you can construct the JSON type response, the reuslt will probraly be sent as a string anyway.
you could also replace the tuples with lists [], theyre valid JSON arrays
0

Your method seems reasonable enough. Here are a few changes I might make to it. The itertools module has lots of handy tools that can make your life easier. I used izip, which you can read up on here

import json
from itertools import izip

def to_json(xdata, ydata):
  data = []
  for x, y in izip(xdata, ydata): # using izip is more memory efficient
    data.append({"x": x, "y": y})
  return json.dumps(data) # convert that list into json

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.