yaml.dump throws an error on numpy.array type attribute of an object in Python

Question

I'd like my object to compactly print itself out (no loading is needed), so that numpy.array is printed as a regular tuple (in this example). Instead, I see an error message TypeError: data type not understood.

Any idea what causes an error message and (once resolved) how to

class A:
    def __init__(self):
        from numpy import array
        self.a_array = array([1,2,3])

    def __repr__(self):
        from yaml import dump
        return dump(self, default_flow_style=False)

A()

Desired output is something like:

object:A
a_array: 
- 1, 2, 3

Any ideas?

UPDATE: This may work (if implementable): Is there a way to have a yaml representer that replaces any array variable x to its x.tolist() representation?

How is it supposed to display if the array is 2d (or larger). Are you interested in small, almost trivial arrays that fit on a human-readable line, or big ones (1000s of elements)? — hpaulj
– hpaulj, Commented Nov 19, 2015 at 7:46
I may have larger arrays, but for now, I'd like to understand how to deal with 1D. I can infer the approach to 2D and larger size data :) Thx for clarifying question. — Oleg Melnikov
– Oleg Melnikov, Commented Nov 19, 2015 at 14:22
x.tolist() is the easiest way to change the array into something yaml knows how to handle. — hpaulj
– hpaulj, Commented Nov 19, 2015 at 14:36
Thx! See update. If there is a way to call tolist() on array variables during yaml.dump, it should work cleanly (affecting only array types and not any other). — Oleg Melnikov
– Oleg Melnikov, Commented Nov 19, 2015 at 18:49
Have you looked at the yaml registration bussiness? You can write a function that handles a particular class of object, and register that with the yaml module. You could do that with your whole class, and with things like arrays that aren't handled to your satisfaction. I seen that in the pyyaml docs, but never implemented it myself. stackoverflow.com/a/27196166/901925 — hpaulj
– hpaulj, Commented Nov 19, 2015 at 18:58

hpaulj · Accepted Answer · 2015-11-19 22:45:20Z

4

Are you interested in generating valid yaml, or just using yaml as a way to display your object? Phrases like 'no load is needed' suggest the latter.

But why focus on yaml? Does it natively handle lists or sequences in the way you want?

If I use tolist to turn an array into a list that yaml can dump, I get:

In [130]: a = np.arange(3)
In [131]: print(yaml.dump({'a':a.tolist()},default_flow_style=False))
a:
- 0
- 1
- 2

In [132]: print(yaml.dump({'a':a.tolist()},default_flow_style=True))
{a: [0, 1, 2]}

I could drop the dictionary part. But either way the list part does not display as:

- 1, 2, 3

I don't see how yaml.dump is any improvement over the default array displays:

In [133]: print(a)
[0 1 2]
In [134]: print(repr(a))
array([0, 1, 2])

For 2d arrays (and arrays that can be turned into 2d), np.savetxt gives a compact display, with fmt options to control the details:

In [139]: np.savetxt('test',a[None,:], fmt='%d')
In [140]: cat 'test'
0 1 2

Here I'm actually writing to a file, and displaying that with system cat, but I could also write to string buffer.

But I can do better. savetxt just writes the array, one row at a time, to the file. I could use the same formatting style directly.

I create a fmt string, with a % specification for each item in a (here a 1d array). Then fmt%tuple(...) formats it. That's just straight forward Python string formatting.

In [144]: fmt = ', '.join(['%d']*a.shape[0])
In [145]: fmt
Out[145]: '%d, %d, %d'
In [146]: fmt%tuple(a.tolist())
Out[146]: '0, 1, 2'

I could add a - and indention, colon, etc to that formatting.

================================

import numpy as np

class A:
    def __init__(self, anArray):
        self.a_array = anArray

    def __repr__(self):
        astr = ['object: %s'%self.__class__]
        astr.append('a_array:')
        astr.append(self.repr_array())
        return '\n'.join(astr)

    def repr_array(self):
        a = self.a_array
        if a.ndim==1:
            a = a[None,:]
        fmt = ', '.join(['%d']*a.shape[1])
        fmt = '- '+fmt
        astr = []
        for row in a:
             astr.append(fmt%tuple(row))
        astr = '\n'.join(astr)
        return astr

print A(np.arange(3))

print A(np.ones((3,2)))

produces

object: __main__.A
a_array:
- 0, 1, 2

for a 1d array, and

object: __main__.A
a_array:
- 1, 1
- 1, 1
- 1, 1

for a 2d array.

=======================================

import yaml
def numpy_representer_str(dumper, data):
    # first cut ndarray yaml representer
    astr = ', '.join(['%s']*data.shape[0])%tuple(data)
    return dumper.represent_scalar('!ndarray:', astr)

def numpy_representer_seq(dumper, data):
    return dumper.represent_sequence('!ndarray:', data.tolist())

yaml.add_representer(np.ndarray, numpy_representer_str)
print (yaml.dump({'a':np.arange(4)},default_flow_style=False))

yaml.add_representer(np.ndarray, numpy_representer_seq)
print (yaml.dump({'a':np.arange(4)},default_flow_style=False))

class A:
    def __init__(self, anArray):
        self.a_array = anArray

    def __repr__(self):
        astr = ['object: %s'%self.__class__]
        astr.append('a_array:')
        astr.append(self.repr_array())
        return '\n'.join(astr)

    def repr_array(self):
        return yaml.dump(self.a_array)
print (A(np.arange(3)))
print (A(np.arange(6).reshape(2,3)))

With the different styles of numpy representer I get print like:

a: !ndarray: '0, 1, 2, 3'   # the string version

a: !ndarray:         # the sequence version
- 0
- 1
- 2
- 3

object: <class '__main__.A'>     # sequence version with 1d
a_array:
!ndarray: [0, 1, 2]

object: <class '__main__.A'>    # sequence version with 2d
a_array:
!ndarray:
- [0, 1, 2]
- [3, 4, 5]

edited Nov 19, 2015 at 22:45

answered Nov 19, 2015 at 17:03

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

hpaulj Over a year ago

You could still use my array formatting ideas with yaml. pyyaml.org/wiki/…

hpaulj Over a year ago

I added a first cut (two actually) at an array representer. I demo it both as standalong yaml dump and as part of your object formatting.

Oleg Melnikov Over a year ago

Great solution. When I run it, a FutureWarning comes up from representer. Is this avoidable?

C:\...\yaml\representer.py:135: FutureWarning: comparison to 'None' will result in an elementwise object comparison in the future.   if data in [None, ()]:

hpaulj Over a year ago

From your comment I can only guess at the location and cause of the warning, and I can't reproduce it My guess is some data value is None (but not necessarily the newly added array).

hpaulj Over a year ago

There's a line in SafeRepresenter that reads if data in [None, ()]:, which could end up trying data==None, while data is None is a better test. I'm not sure why my tests don't raise the warning. May be some sort of version issue.

|

Community · Accepted Answer · 2017-05-23 12:34:03Z

1

You can marshal your numpy array to a list when representing in your A object. Then unmarshal it when retrieving from your object:

class A:
    def __init__(self):
        from numpy import array
        self.a_lst = [1,2,3]

    def __repr__(self):
        from yaml import dump
        return dump(self, default_flow_style=False)

    # convert internal list to numpy array before returning.
    @property
    def my_arr(self):
        return array(self.a_lst)

    # convert array to list before storing internally.
    @my_arr.setter
    def my_arr(self, array):
        self.a_lst = array.tolist()

print(repr(A()))

The key is to ensure that you are storing the array as a plain python list while inside your object so you can ensure you can do a yaml dump.

A possibly better alternative is to use the built-in dump functionality provided by numpy. See answer here.

edited May 23, 2017 at 12:34

CommunityBot

11 silver badge

answered Nov 19, 2015 at 3:57

Martin Konecny

59.9k20 gold badges144 silver badges159 bronze badges

3 Comments

Oleg Melnikov Over a year ago

Thanks Martin. Actually, I need to store it as array, but output it as if it was a list. I thought there may be a yaml.representer (or simpler) solution for it. All internal computations would be done on array, which itself is a product of internal computations. So, storing array would be natural. However, yaml does not print it nicely (once error is handled) :(

Martin Konecny Over a year ago

Assuming your object is storing only a_array as a member, you could do the following: a = A(); dump(a.a_array.tolist(), default_flow_style=False).

Oleg Melnikov Over a year ago

That'd be too trivial. In actuality, there are other class-scope structures, including nested class compositions (which also need to be printed). For now, I just wanted to focus on nice-printing array :)

Collectives™ on Stack Overflow

yaml.dump throws an error on numpy.array type attribute of an object in Python

2 Answers 2

7 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related