numpy array of objects

Question

I'm trying to implement a simulation for a lattice model (lattice boltzmann) in Python. Each site of the lattice has a number of properties, and interact with neighboring sites according to certain rules. I figured that it might be clever to make a class with all the properties and make a grid of instances of that class. (As I'm inexperienced with Python, this might not be a good idea at all, so feel free to comment on my approach.)

Here is a toy example of what I'm doing

class site:
    def __init__(self,a,...):
        self.a = a
        .... other properties ...
    def set_a(self, new_a):
        self.a = new_a

Now I want to deal with a 2D/3D lattice (grid) of such sites so I tried to do the following (here is a 2D 3x3 grid as an example, but in simulation I would need the order of >1000x1000X1000)

lattice = np.empty( (3,3), dtype=object)
lattice[:,:] = site(3)

Now, the problem is that each lattice point refer to the same instance, for example

lattice[0,0].set_a(5)

will also set the value of lattice[0,2].a to 5. This behavior is unwanted. To avoid the problem i can loop over each grid point and assign the objects element by element, like

for i in range(3):
    for j in range(3):
        lattice[i,j] = site(a)

But is there a better way (not involving the loops) to assign objects to a multidimensional array?

Thanks

If you're dealing with a >1000x1000X1000 array, don't use an object array!! It will use egregious amounts of memory compared to using a "normal" numpy array. Object arrays aren't what you want here... — Joe Kington
– Joe Kington, Commented Feb 2, 2011 at 18:40
by simulation I guess you mean fluid simulation? If so, then I'll recommend you to rethink your approach. Perhaps elements of your arrays should be numerical ones, so you could harness all the power of linear algebra ;-). Particle propagation and collision processes must be done globally! No local object lattice is able to handle that in any reasonable computation time. Just toughts, don't know really what you are aiming to ;-). Thanks — eat
– eat, Commented Feb 2, 2011 at 21:57
@eat: I am doing fluid simulation. I wanted to code a generic grid of sites, where all the local properties were collected in a class (collision is local in my book, not propagation tho), but I guess you're right after all. At least bpowah taught me how to vectorize the init function :) — jonalm
– jonalm, Commented Feb 2, 2011 at 23:17
Incidentally, have you seen sailfish? sailfish.us.edu.pl/index.html It's a GPU accelerated fluid simulation package using a Lattice-Boltzman method implemented in numpy and pyopencl/pycuda. It's pretty slick from what I've seen (which are just the demo videos...). At any rate, thought you might find it relevant. — Joe Kington
– Joe Kington, Commented Feb 3, 2011 at 1:06

Paul · Accepted Answer · 2013-04-12 02:44:04Z

32

You can vectorize the class's __init__ function:

import numpy as np

class Site:
    def __init__(self, a):
        self.a = a
    def set_a(self, new_a):
        self.a = new_a

vSite = np.vectorize(Site)

init_arry = np.arange(9).reshape((3,3))

lattice = np.empty((3,3), dtype=object)
lattice[:,:] = vSite(init_arry)

This may look cleaner but has no performance advantage over your looping solution. The list comprehension answers create an intermediate python list which would cause a performance hit.

edited Apr 12, 2013 at 2:44

answered Feb 2, 2011 at 18:18

Paul

43.9k17 gold badges112 silver badges126 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Neil G Over a year ago

What if Site takes more than one parameter?

Paul Over a year ago

@NeilG You'd have to encapsulate all the initialization information in a as a higher dimension array or recarray. The vectorize'd function only assumes that the first parameter is a vector (unless of course you used the excluded parameter to define a different one). If you insist on having Site multi-parameter, you could write a wrapper for Site that expands the recarray custom dtype to Site's (*args, **vargs) and vectorize the wrapper. Keep in mind that this is all just for fun. There is no performance advantage here over the OP's looping solution.

Neil G Over a year ago

@Paul: Yes, I understand your point. I've never seen vectorize applied to a constructor before. It's a neat solution for this problem, but it is not a general solution since it makes adding parameters to Site's constructor cumbersome.

Seth Johnson · Accepted Answer · 2011-02-02 17:40:42Z

9

The missing piece for you is that Python treats everything as a reference. (There are some "immutable" objects, strings and numbers and tuples, that are treated more like values.) When you do

lattice[:,:] = site(3)

you are saying "Python: make a new object site, and tell every element of lattice to point to that object." To see that this is really the case, print the array to see that the memory addresses of the objects are all the same:

array([[<__main__.Site object at 0x1029d5610>,
        <__main__.Site object at 0x1029d5610>,
        <__main__.Site object at 0x1029d5610>],
       [<__main__.Site object at 0x1029d5610>,
        <__main__.Site object at 0x1029d5610>,
        <__main__.Site object at 0x1029d5610>],
       [<__main__.Site object at 0x1029d5610>,
        <__main__.Site object at 0x1029d5610>,
        <__main__.Site object at 0x1029d5610>]], dtype=object)

The loop way is one correct way to do it. With numpy arrays, that may be your best option; with Python lists, you could also use a list comprehension:

lattice = [ [Site(i + j) for i in range(3)] for j in range(3) ]

You can use a list comprehension with the numpy.array construction:

lattice = np.array( [ [Site(i + j) for i in range(3)] for j in range(3) ],
                    dtype=object)

Now when you print lattice, it's

array([[<__main__.Site object at 0x1029d53d0>,
        <__main__.Site object at 0x1029d50d0>,
        <__main__.Site object at 0x1029d5390>],
       [<__main__.Site object at 0x1029d5750>,
        <__main__.Site object at 0x1029d57d0>,
        <__main__.Site object at 0x1029d5990>],
       [<__main__.Site object at 0x1029d59d0>,
        <__main__.Site object at 0x1029d5a10>,
        <__main__.Site object at 0x1029d5a50>]], dtype=object)

so you can see that every object in there is unique.

You should also note that "setter" and "getter" methods (e.g., set_a) are un-Pythonic. It's better to set and get attributes directly, and then use the @property decorator if you REALLY need to prevent write access to an attribute.

Also note that it's standard for Python classes to be written using CamelCase, not lowercase.

edited Feb 2, 2011 at 17:40

answered Feb 2, 2011 at 17:34

Seth Johnson

15.2k8 gold badges63 silver badges89 bronze badges

2 Comments

jonalm Over a year ago

Thank you so much for the input. But I don't know if I understand you correctly; For a class instance A = site(4,5), Is it better (more Pythonic) to update a variable with 'A.a = 6' than 'A.set_a(6)'? Any references on good guides to pythonic coding?

Neil G Over a year ago

Be careful with this is Site implements __len__.

DSM · Accepted Answer · 2011-02-02 18:22:52Z

9

I don't know about better, but as an alternative to an explicit set of loops, you could write

lattice = np.empty( (3,3), dtype=object)
lattice.flat = [site(3) for _ in lattice.flat]

which should work whatever the shape of the lattice.

answered Feb 2, 2011 at 18:22

DSM

355k67 gold badges606 silver badges504 bronze badges

Comments

mkrieger1 · Accepted Answer · 2025-07-17 17:43:49Z

If speed is very important to you, consider using a different programming that is faster. If you really want to program in Python, this is still possible, but there will be a performance hit compared to faster languages as C. We can get pretty close however.

The main problem here is that your fundamental data structure is very inefficient. Often when designing a program, we have to make the choice "list of structs" or "struct of lists". In other words: a list of objects, or an objects containing lists. You are trying to create a list of structs. While this can be very convenient and more object oriented, it is very inefficient here. You want your data to be inside numpy arrays. They are stored very compactly in memory, and operations on them are very efficient. For lattice Boltzmann you need 9 numbers per grid cell I believe. So ideally, your data would be stored in an Nx x Ny x Nz x 9 numpy array.

For loops in Python can be quite slow, but there are a number of ways out. If you can write the for loop as an operation in NumPy, you will benefit from the highly optimized numpy code, which is written in C. Another option is Numba, which allows you to write fast for loops. In Numba your code is first compiled before it is run and this greatly optimizes things. Writing everything in terms of numpy operations can become a bit hard to read sometimes, so this is where numba might be nice.

Collectives™ on Stack Overflow

numpy array of objects

4 Answers 4

3 Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related