Read sparse matrix in python

Question

I want to read a sparse matrix. When I am building ngrams using scikit learn. Its transform() gives output in sparse matrix. I want to read that matrix without doing todense().

Code:

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
document = ['john guy','nice guy']
vectorizer = CountVectorizer(ngram_range=(1, 2))
X = vectorizer.fit_transform(document)
transformer = vectorizer.transform(document)
print transformer

Output :

  (0, 0)    1
  (0, 1)    1
  (0, 2)    1
  (1, 0)    1
  (1, 3)    1
  (1, 4)    1

How can I read this output to get its values. I need value at (0,0), (0,1) and so on and save into list.

hpaulj · Accepted Answer · 2014-11-12 17:49:02Z

The documentation for this transform method says it returns a sparse matrix, but doesn't specify the kind. Different kinds let you access the data in different ways, but it is easy to convert one to another. Your print display is the typical str for a sparse matrix.

An equivalent matrix can be generated with:

from scipy import sparse
i=[0,0,0,1,1,1]
j=[0,1,2,0,3,4]
A=sparse.csr_matrix((np.ones_like(j),(i,j)))
print(A)

producing:

  (0, 0)        1
  (0, 1)        1
  (0, 2)        1
  (1, 0)        1
  (1, 3)        1
  (1, 4)        1

A csr type can be indexed like a dense matrix:

In [32]: A[0,0]
Out[32]: 1    
In [33]: A[0,3]
Out[33]: 0

Internally the csr matrix stores its data in data, indices, indptr, which is convenient for calculation, but a bit obscure. Convert it to coo format to get data that looks just like your input:

In [34]: A.tocoo().row
Out[34]: array([0, 0, 0, 1, 1, 1], dtype=int32)

In [35]: A.tocoo().col
Out[35]: array([0, 1, 2, 0, 3, 4], dtype=int32)

Or you can convert it to a dok type, and access that data like a dictionary:

A.todok().keys()
#  dict_keys([(0, 1), (0, 0), (1, 3), (1, 0), (0, 2), (1, 4)])
A.todok().items()

produces: (Python3 here)

dict_items([((0, 1), 1), 
            ((0, 0), 1), 
            ((1, 3), 1), 
            ((1, 0), 1), 
            ((0, 2), 1), 
            ((1, 4), 1)])

The lil format stores the data as 2 lists of lists, one with the data (all 1s in this example), and the other with the row indices.

Or do you what to 'read' the data in some other way?

Fred Foo · Accepted Answer · 2014-11-12 15:11:23Z

This is a SciPy CSR matrix. To convert this to (row, col, value) triples, the easiest option is to convert to COO format, then get the triples from that:

>>> from scipy.sparse import rand
>>> X = rand(100, 100, format='csr')
>>> X
<100x100 sparse matrix of type '<type 'numpy.float64'>'
    with 100 stored elements in Compressed Sparse Row format>
>>> zip(X.row, X.col, X.data)[:10]
[(1, 78, 0.73843533223380842),
 (1, 91, 0.30943772717074158),
 (2, 35, 0.52635078317400608),
 (4, 75, 0.34667509458006551),
 (5, 30, 0.86482318943934389),
 (7, 74, 0.46260571098933323),
 (8, 75, 0.74193890941716234),
 (9, 72, 0.50095749482583696),
 (9, 80, 0.85906284644174613),
 (11, 66, 0.83072142899400137)]

(Note that the output is sorted.)

Irshad Bhat · Accepted Answer · 2014-11-12 14:56:31Z

2

You can use data and indices as:

>>> indices=transformer.toarray()
>>> indices
array([[1, 1, 1, 0, 0],
      [1, 0, 0, 1, 1]])
>>> values=transformer.data
>>> values
array([1, 1, 1, 1, 1, 1])

edited Nov 12, 2014 at 14:56

answered Nov 12, 2014 at 14:47

Irshad Bhat

8,7792 gold badges31 silver badges37 bronze badges

2 Comments

iNikkz Over a year ago

Sorry, I have mentioned in the question that I don't want to use todense() to convert the sparse matrix into ordinary matrix. Any other solution.

Irshad Bhat Over a year ago

@Nikkz, Sorry, I didn't read the question carefully. I've provided some methods of reading data. Hope it helps.

Collectives™ on Stack Overflow

Read sparse matrix in python

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related