2

I have an numpy array of dtype = object containing multiple other arrays for elements and I need to convert it to a sparse matrix.

Ex:

a = np.array([np.array([1,0,2]),np.array([1,3])])
array([array([1, 0, 2]), array([1, 3])], dtype=object)

I have tried the solution given by Convert numpy object array to sparse matrix with no success.

In [45]: M=sparse.coo_matrix(a)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-45-d75020bb3a38> in <module>()
----> 1 M=sparse.coo_matrix(a)

/home/arturcastiel/.local/lib/python3.6/site-packages/scipy/sparse/coo.py in __init__(self, arg1, shape, dtype, copy)
    183                     self._shape = check_shape(M.shape)
    184 
--> 185                 self.row, self.col = M.nonzero()
    186                 self.data = M[self.row, self.col]
    187                 self.has_canonical_format = True

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

As it was explained on the comments, this is actually a jagged array. In essence, this array represents a graph that I have to convert to sparse matrix so I can use the scipy.sparse.csgraph.shortest_path routine.

Thus,

np.array([np.array([1,0,2]),np.array([1,3])])

should become something such as:

(1,1) 1
(1,2) 0
(1,3) 2
(2,1) 1
(2,2) 3
2
  • What is the shape of the final matrix should be? What coordinates are digits 1, 3 of the second array supposed to have? Commented Apr 26, 2019 at 15:24
  • You have a jagged array, lots of methods don't work with that requirement Commented Apr 26, 2019 at 15:46

2 Answers 2

2

You can't. This error arises when it tries to find the nonzero elements of a. A sparse matrix just stores the nonzero elements of a matrix. Try

np.nonzero(a)  

If your array contained lists instead of arrays, it would work - sort of:

In [615]: a = np.array([[1,0,1],[1,3]])                                              
In [616]: np.nonzero(a)                                                              
Out[616]: (array([0, 1]),)

In [618]: sparse.coo_matrix(a)                                                       
Out[618]: 
<1x2 sparse matrix of type '<class 'numpy.object_'>'
    with 2 stored elements in COOrdinate format>
In [619]: print(_)                                                                   
  (0, 0)    [1, 0, 1]
  (0, 1)    [1, 3]

Note this is a (1,2) shaped array, with 2 nonzero elements, both of which are the lists (objects) of the original.

But coo format does little processing. It can't for example be converted to csr for computations:

In [622]: _618.tocsr()                                                               
---------------------------------------------------------------------------
TypeError: no supported conversion for types: (dtype('O'),)

If the array wasn't jagged, it could be made into a useful sparse matrix:

In [623]: a = np.array([[1,0,1],[1,3,0]])                                            
In [624]: a                                                                          
Out[624]: 
array([[1, 0, 1],
       [1, 3, 0]])

In [626]: sparse.coo_matrix(a)                                                       
Out[626]: 
<2x3 sparse matrix of type '<class 'numpy.int64'>'
    with 4 stored elements in COOrdinate format>
In [628]: print(_)                                                                   
  (0, 0)    1
  (0, 2)    1
  (1, 0)    1
  (1, 1)    3

note that the 0 values have been omitted. In large useful sparse matrices, more than 90% of the elements are zero.

===

Here's a way of constructing a sparse matrix from your array of arrays. I build the row,col,data attributes of a coo format matrix from the individual arrays in a.

In [630]: a = np.array([np.array([1,0,1]),np.array([1,3])])                          
In [631]: row, col, data = [],[],[]                                                  
In [632]: for i,n in enumerate(a): 
     ...:     row.extend([i]*len(n)) 
     ...:     col.extend(np.arange(len(n))) 
     ...:     data.extend(n) 
     ...:                                                                            
In [633]: row,col,data                                                               
Out[633]: ([0, 0, 0, 1, 1], [0, 1, 2, 0, 1], [1, 0, 1, 1, 3])
In [634]: M = sparse.coo_matrix((data, (row,col)))                                   
In [635]: M                                                                          
Out[635]: 
<2x3 sparse matrix of type '<class 'numpy.int64'>'
    with 5 stored elements in COOrdinate format>
In [636]: print(M)                                                                   
  (0, 0)    1
  (0, 1)    0
  (0, 2)    1
  (1, 0)    1
  (1, 1)    3
In [637]: M.A                                                                        
Out[637]: 
array([[1, 0, 1],
       [1, 3, 0]])

An alternative to is to pad a to make a 2d numeric array, and make the sparse one from that. Padding a jagged list/array has been asked before, with various solutions. This is one of the easier ones to remember and use:

In [658]: alist = list(zip(*(itertools.zip_longest(*a,fillvalue=0))))                                                                            
In [659]: alist                                                                      
Out[659]: [(1, 0, 1), (1, 3, 0)]
In [661]: sparse.coo_matrix(alist)                                                   
Out[661]: 
<2x3 sparse matrix of type '<class 'numpy.int64'>'
    with 4 stored elements in COOrdinate format>
In [662]: _.A                                                                        
Out[662]: 
array([[1, 0, 1],
       [1, 3, 0]])
Sign up to request clarification or add additional context in comments.

2 Comments

Is there anyway to change this jagged array into a regular matrix filling the remaining spaces with 0 ?
I added a couple of ideas.
0

I'd consider using a dok_matrix if your arrays have a lot of omissed trailing zeros:

In [98]: dok = sparse.dok_matrix((2, 3), dtype=np.int64)

In [99]: for r_num, row in enumerate(a):
    ...:     for col_num, el in enumerate(row):
    ...:         dok[r_num, col_num] = el 
    ...:         

In [100]: dok.toarray()
Out[100]: 
array([[1, 0, 1],
       [1, 3, 0]], dtype=int64)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.