4

I have a numpy array that I would like to dump with Json. The array looks like this:

array([['foo', 'bar', 'something', ...
        'more'],
        ['0.4', '0.7', '0.83', ...
        '0.3', '0.62', '0.51']]

and I would like to dump it on a string with Json as follows:

foo: 0.4
bar: 0.7
something: 0.51
...

I have tried with:

import jason
my_string = json.dumps(my_array)

but it complains with:

"not JSON serializable"

Any thoughts on how to dump this on a string with Json?

Update:

Please not that I care about ordering, lines should be printed in the following order:

array[0,0] : array[0,1]
array[1,0] : array[1,1]
array[2,0] : array[2,1]
# etc ...
1
  • Your array indexing at the end is incorrect. For 2D arrays such as this the syntax is array[row][column] and, since you only have two rows, the maximum value for the first index would be 1. Commented Nov 19, 2012 at 22:52

5 Answers 5

7

what worked for me - since having larger 1024x1002 arrays of float64 - was conversion to base64.

def Base64Encode(ndarray):
    return json.dumps([str(ndarray.dtype),base64.b64encode(ndarray),ndarray.shape])
def Base64Decode(jsonDump):
    loaded = json.loads(jsonDump)
    dtype = np.dtype(loaded[0])
    arr = np.frombuffer(base64.decodestring(loaded[1]),dtype)
    if len(loaded) > 2:
        return arr.reshape(loaded[2])
    return arr

''' just to compare '''
def SimpleEncode(ndarray):
    return json.dumps(ndarray.tolist())
def SimpleDecode(jsonDump):
    return np.array(json.loads(jsonDump))

ipython %timeit result points very clearly to base64:

arr = np.random.random_sample((1000, 1000))

print 'Simple Convert'
%timeit SimpleDecode(SimpleEncode(arr))
print 'Base64 Encoding'
%timeit Base64Decode(Base64Encode(arr))

result:

Simple Convert
1 loops, best of 3: 1.42 s per loop
Base64 Encoding
10 loops, best of 3: 171 ms per loop
Sign up to request clarification or add additional context in comments.

2 Comments

worked for me, but had to decode to serialize: return json.dumps([str(ndarray.dtype),base64.b64encode(ndarray).decode('utf-8'),ndarray.shape]) and then convert to byte array during deserialize: arr = np.frombuffer(base64.decodestring(bytearray(loaded[1], 'utf-8')), dtype)
also - to be able to base64encode() an array it needs to be contiguous in memory, so if it is not, it needs to be converted: ndarr = np.ascontiguousarray(ndarray, dtype=ndarray.dtype)
1

Not sure about the JSON serializable part, but you could convert it to a dict first? That seems like a more natural format for JSON output, and would deal with any issues with the data type.

my_dict = dict(zip(my_array[1], my_array[0]))

3 Comments

Would this preserve the order that I have in the array? (i.e. the line my_array[0,0]: my_array[0,1] should be printed before the line my_array[1,0]: my_array[1,1] and so on.
but dictionaries keep their key-values unsorted. How do you guarantee that the entries in my dictionary are printed in the right order when I pass this to JSON?
Probably not, so maybe this won't work. Python dicts are unordered by default. I think that Python 2.7 has an OrderedDict class, but whether that would work depends on whether your JSON library would respect the ordering. (Just noticed that you replied to my comment from earlier, which I deleted. I answered that before I read the comment thoroughly. Still kinda new here, sorry.)
0

If all values are numeric, you can always do it manually if everything else fails:

my_array = [['0.4', '0.7', '0.83', '0.3', '0.62', '0.51'],
            ['foo', 'bar', 'something', 'more']]

pairs = zip(my_array[1], my_array[0])
json_values = ('"{}": {}'.format(label, value) for label, value in pairs)
my_string = '{' + ', '.join(json_values) + '}'

print my_string # '{"foo": 0.4, "bar": 0.7, "something": 0.83, "more": 0.3}'

Comments

0

If you are just trying to get a pretty string representation of your array, and using a string array type doesn't give you the representation you want, then a message serialization format is not the thing to use. Serialization formats are for saving/transmitting data. Json is nice in that it is often human readable too, but that is not the purpose, and forcing it into a different format would make it no longer a json serialization. Even the savetxt and loadtxt numpy options are not going to work for the formatting that you want (repeating the first row for each column). You can make your own serialization if it has to be in that format using the following code:

def prettySerialize(inArray):
    ids = inArray[0]
    strRep = ''

    for row in inArray[1:]:
        for i,item in enumerate(row):
            rowStr = id[i] + ':' + item + '\n'
            strRep += rowStr

    return strRep

The problem with this is that it will be much slower and a much larger representation of the array (repeating the "id" row over and over). I would highly recommend going with a pure json (or msgpack) solution unless you are specifically formatting this for human reading...

This is a solution I am using for serializing with msgpack (that will also work with json)... Convert to a tuple that includes dtype and array shape:

def arrayToTuple(arr):
    if arr is None:
        return None

    return (arr.dtype.str, arr.shape, arr.tostring())

def arrayFromTuple(tupl):
    if tupl is None:
        return None

    typeStr, shape, dataStr = tupl

    resultArray = numpy.fromstring(dataStr, dtype=typeStr).reshape(shape)

    return resultArray

So the dumps and loads commands would be:

strRep = json.dumps(arrayToTuple(arr))
arrayFromTuple(json.loads(strRep))

And this also works for msgpack.dumps and msgpack.loads (a faster more compact binary representation).

A caveat that may be applicable to your array: If your numpy array is an object dtype then it will not serialize by standard methods as a full array. You would have to serialize each object individually because its the object id, not data, that's stored in the array. Using dtype as dtype='|S' where is the maximum string length will make the array serializable.

Comments

-1

I've only worked with numpy a little, but I think it saves data internally in a special format, so it would make sense that the json module doesn't know how to handle it.

Does converting it back to an array work?

json.dumps(numpy.asarray(my_array))

http://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html

1 Comment

the output of asarray() is numpy.ndarray and is meant for converting lists/tuples to ndarrays

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.