3

I used the Scipy Reverse Cuthill-McKee implementation (scipy.sparse.csgraph.reverse_cuthill_mckee) for creating a band matrix using a (high-dimensional) sparse csr_matrix. The result of this method is a permutation array whichs gives me the indices of how to permutate the rows of my matrix as I understood.

Now is there any efficient solution for doing this permutation on my sparse csr_matrix in any other sparse matrix (csr, lil_matrix, etc)? I tried a for-loop but my matrix has dimension like 200,000 x 150,000 and it takes too much time.

A = csr_matrix((data,(rowind,columnind)), shape=(200000, 150000), dtype=np.uint8)

permutation_array = csgraph.reverse_cuthill_mckee(A, false)

result_matrix = lil_matrix((200000, 150000), dtype=np.uint8)

i=0
for x in np.nditer(permutation_array):
    result_matrix[x, :]=A[i, :]
    i+=1

The result of the reverse_cuthill_mckee call is an array which is like a tupel containing the indices for my permutation. So this array is something like: [199999 54877 54873 ..., 12045 9191 0] (size = 200,000)

This means: row with index 0 has now index 199999, row with index 1 has now index 54877, row with index 2 has now index 54873, etc. see: https://en.wikipedia.org/wiki/Permutation#Definition_and_notations (As I understood the return)

Thank you

6
  • Possible duplicate of Swap rows csr_matrix scipy Commented Aug 17, 2017 at 14:01
  • No, that doesn't help me.. Commented Aug 17, 2017 at 14:56
  • You don't need nditer to iterate on an array; in fact it is usually slower. I have uses this csgraph function. What does it return? Commented Aug 17, 2017 at 15:11
  • What is csr_A, false? Commented Aug 17, 2017 at 15:18
  • I edited my post specifying the return of the function call. Sorry csr_A was the old name, edited it to A. false means in this function call that the matrix is not symmetric (see scipy documentation). Commented Aug 17, 2017 at 15:22

1 Answer 1

6

I wonder if you are applying the permutation array correctly.

Make a random matrix (float) and convert it to a uint8 (beware, csr calculations might not work with this dtype):

In [963]: ran=sparse.random(10,10,.3, format='csr')
In [964]: A = sparse.csr_matrix((np.ones(ran.data.shape).astype(np.uint8),ran.indices, ran.indptr))
In [965]: A.A
Out[965]: 
array([[1, 1, 0, 0, 0, 0, 1, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 1, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0, 0, 1, 0, 1],
       [0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
       [1, 0, 1, 0, 0, 1, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
       [0, 1, 1, 1, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 1, 1, 0, 0, 0]], dtype=uint8)

(oops, used the wrong matrix here):

In [994]: permutation_array = csgraph.reverse_cuthill_mckee(A, False)
In [995]: permutation_array
Out[995]: array([9, 7, 0, 4, 6, 3, 5, 1, 8, 2], dtype=int32)

My first inclination is to use such an array to simply index rows of the original matrix:

In [996]: A[permutation_array,:].A
Out[996]: 
array([[0, 0, 0, 0, 1, 1, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
       [1, 1, 0, 0, 0, 0, 1, 0, 0, 0],
       [0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0, 0, 1, 0, 1],
       [1, 0, 1, 0, 0, 1, 0, 1, 0, 0],
       [0, 1, 1, 1, 1, 1, 1, 0, 1, 0],
       [0, 1, 1, 1, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)

I see some clustering; maybe the best we can expect from a random matrix.

You on the other hand appear to be doing:

In [997]: res = sparse.lil_matrix(A.shape,dtype=A.dtype)
In [998]: res[permutation_array,:] = A
In [999]: res.A
Out[999]: 
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
       [0, 0, 0, 0, 1, 1, 1, 0, 0, 0],
       [1, 0, 1, 0, 0, 1, 0, 1, 0, 0],
       [1, 1, 0, 0, 0, 0, 0, 1, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 1, 1, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 1, 0, 1, 0],
       [0, 1, 1, 1, 0, 1, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0, 1, 0, 0, 0]], dtype=uint8)

I don't see any improvement in clustering of 1s in res.


The docs for the MATLAB equivalent say

r = symrcm(S) returns the symmetric reverse Cuthill-McKee ordering of S. This is a permutation r such that S(r,r) tends to have its nonzero elements closer to the diagonal.

In numpy terms, that means:

In [1019]: I,J=np.ix_(permutation_array,permutation_array)
In [1020]: A[I,J].A
Out[1020]: 
array([[0, 0, 0, 1, 1, 0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 1, 0, 1, 0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0, 0, 1, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 0, 0, 0, 0, 1, 0, 0],
       [0, 1, 1, 0, 0, 0, 1, 0, 0, 1],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0, 1, 1, 1, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)

And indeed there are more 0 bands in the 2 off diagonal corners.

And using the bandwidth calculation on the MATLAB page, https://www.mathworks.com/help/matlab/ref/symrcm.html

In [1028]: i,j=A.nonzero()
In [1029]: np.max(i-j)
Out[1029]: 7
In [1030]: i,j=A[I,J].nonzero()
In [1031]: np.max(i-j)
Out[1031]: 5

The MATLAB docs say that with this permutation, the eigenvalues remain the same. Testing:

In [1032]: from scipy.sparse import linalg
In [1048]: linalg.eigs(A.astype('f'))[0]
Out[1048]: 
array([ 3.14518213+0.j        , -0.96188843+0.j        ,
       -0.58978939+0.62853903j, -0.58978939-0.62853903j,
        1.09950364+0.54544497j,  1.09950364-0.54544497j], dtype=complex64)
In [1049]: linalg.eigs(A[I,J].astype('f'))[0]
Out[1049]: 
array([ 3.14518023+0.j        ,  1.09950352+0.54544479j,
        1.09950352-0.54544479j, -0.58978981+0.62853914j,
       -0.58978981-0.62853914j, -0.96188819+0.j        ], dtype=complex64)

Eigenvalues are not the same for the row permutations we tried earlier:

In [1050]: linalg.eigs(A[permutation_array,:].astype('f'))[0]
Out[1050]: 
array([ 2.95226836+0.j        , -1.60117996+0.52467293j,
       -1.60117996-0.52467293j, -0.01723826+1.06249797j,
       -0.01723826-1.06249797j,  0.90314150+0.j        ], dtype=complex64)
In [1051]: linalg.eigs(res.astype('f'))[0]
Out[1051]: 
array([-0.05822830-0.97881651j, -0.99999994+0.j        ,
        1.17350495+0.j        , -0.91237622+0.8656373j ,
       -0.91237622-0.8656373j ,  2.26292515+0.j        ], dtype=complex64)

This [I,J] permutation works with the example matrix in http://ciprian-zavoianu.blogspot.com/2009/01/project-bandwidth-reduction.html

In [1058]: B = np.matrix('1 0 0 0 1 0 0 0;0 1 1 0 0 1 0 1;0 1 1 0 1 0 0 0;0 0 0 
      ...: 1 0 0 1 0;1 0 1 0 1 0 0 0; 0 1 0 0 0 1 0 1;0 0 0 1 0 0 1 0;0 1 0 0 0 
      ...: 1 0 1')
In [1059]: B
Out[1059]: 
matrix([[1, 0, 0, 0, 1, 0, 0, 0],
        [0, 1, 1, 0, 0, 1, 0, 1],
        [0, 1, 1, 0, 1, 0, 0, 0],
        [0, 0, 0, 1, 0, 0, 1, 0],
        [1, 0, 1, 0, 1, 0, 0, 0],
        [0, 1, 0, 0, 0, 1, 0, 1],
        [0, 0, 0, 1, 0, 0, 1, 0],
        [0, 1, 0, 0, 0, 1, 0, 1]])
In [1060]: Bm=sparse.csr_matrix(B)
In [1061]: Bm
Out[1061]: 
<8x8 sparse matrix of type '<class 'numpy.int32'>'
    with 22 stored elements in Compressed Sparse Row format>
In [1062]: permB = csgraph.reverse_cuthill_mckee(Bm, False)
In [1063]: permB
Out[1063]: array([6, 3, 7, 5, 1, 2, 4, 0], dtype=int32)
In [1064]: Bm[np.ix_(permB,permB)].A
Out[1064]: 
array([[1, 1, 0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 0, 0, 0],
       [0, 0, 1, 1, 1, 0, 0, 0],
       [0, 0, 1, 1, 1, 1, 0, 0],
       [0, 0, 0, 0, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 1, 1]], dtype=int32)
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for the answer but can you tell me what .A does? Like A.A. I tried your code and got similar results. The problem is I have to have the band matrix structure which is calculated with the algorithm. But indexing the rows on the original matrix does not satisfy this. And doing res[permutation_array,:] = A gives me a Exception "unhandled MemoryError" on my original data. There has to be a smart solution, even for large matrices, I thought..
The .A is just short hand for .toarray(). I'm using it to visualize the matrix as a regular dense array.
I don't get your last edit. The reverse cuthill-mckee implementation in Numpy does create a correct band matrix. I just want to apply the permutation array which I get as result using this algorithm on my sparse (very large) matrix. For this I just have to, or rather am allowed to, swap the rows of the matrix. But in the matrix of your last edit I think also some columns are swapped.
Yes, the MATLAB permutation swaps columns as well as rows. Without an example I don't know what kind of banding you see or expect.
The eigenvalues are irrelevant here. I think I have clarified the specific problem more than enough and swapping columns is just not the way you do it in Cuthill-McKee. I can also post an example but I don't need an alternative solution, just help with applying the array on the matrix as I said and as explained in the start post. Thank you for trying to help me anyway!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.