Efficient way of taking Logarithm function in a sparse matrix

Question

I have a big sparse matrix. I want to take log4 for all element in that sparse matrix.

I try to use numpy.log() but it doesn't work with matrices.

I can also take logarithm row by row. Then I crush old row with a new one.

# Assume A is a sparse matrix (Linked List Format) with float values as data
# It is only for one row

import numpy as np
c = np.log(A.getrow(0)) / numpy.log(4)
A[0, :] = c

This was not as quick as I'd expected. Is there a faster way to do this?

Danica · Accepted Answer · 2012-03-23 14:39:03Z

12

You can modify the data attribute directly:

>>> a = np.array([[5,0,0,0,0,0,0],[0,0,0,0,2,0,0]])
>>> coo = coo_matrix(a)
>>> coo.data
array([5, 2])
>>> coo.data = np.log(coo.data)
>>> coo.data
array([ 1.60943791,  0.69314718])
>>> coo.todense()
matrix([[ 1.60943791,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ,  0.69314718,
          0.        ,  0.        ]])

Note that this doesn't work properly if the sparse format has repeated elements (which is valid in the COO format); it'll take the logs individually, and log(a) + log(b) != log(a + b). You probably want to convert to CSR or CSC first (which is fast) to avoid this problem.

You'll also have to add checks if the sparse matrix is in a different format, of course. And if you don't want to modify the matrix in-place, just construct a new sparse matrix as you did in your answer, but without adding 3 because that's completely unnecessary here.

edited Mar 23, 2012 at 14:39

answered Mar 23, 2012 at 4:01

Danica

29k6 gold badges94 silver badges128 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Baskaya Over a year ago

What are the differences between my solution and your solution? You propose only that 3 isn't necessary which has already proposed as a comment by you in my solution.

Danica Over a year ago

@Thorn I actually initially misread your solution (thought you were adding 3 to the entire matrix and so doing a whole lot of unnecessary logarthms). You're right that they're basically the same.

P i Over a year ago

Grateful for this answer as the example makes it very clear. Even if it is essentially the same answer, it's good to have it here.

Baskaya · Accepted Answer · 2012-03-24 19:32:58Z

0

I think I solve it with very easy way. It is very strange that no one could answer immediately.

# Let A be a COO_matrix
import numpy as np
from scipy.sparse import coo_matrix
new_data = np.log(A.data+3)/np.log(4) #3 is not so important. It can be 1 too.
A = coo_matrix((new_data, (A.row, A.col)), shape=A.shape)

edited Mar 24, 2012 at 19:32

answered Mar 22, 2012 at 23:52

Baskaya

7,8916 gold badges31 silver badges33 bronze badges

10 Comments

Li-aung Yip Over a year ago

No-one suggested this solution because it is mathematically incorrect. log(x) could be very different from log(x+1)! (example: log(0.000001) = -6, log(0.0000001 + 1 = 0 and a bit.

Baskaya Over a year ago

Sorry for ill-posed question. I didn't mention that all data are positive and bigger than 1. This are the values of TF(term frequency) matrices. I think there will be no problem.

Danica Over a year ago

There's absolutely no reason to add 3 (or anything) here, since none of the entries in A.data will be 0. But if you do want to take the approach of adding a constant, use a smaller one! Adding say 1e-16 will have the same effect of never taking log(0) but with much less error introduced: using the appropriate identity, it's log(x + eps) = log(x) + log(1 + eps/a), where the error introduced is near 0 if eps/a is almost 0.

Baskaya Over a year ago

@Dougal thank you but I want to add 3 because I don't want to make these 1's zero after logarithm. It is a design concern of term matrices and it is related to text processing. I do not want to add very small value because, for me, 1s are more important than you expect. Besides, It doesn't change much.

Danica Over a year ago

@Thorn I'm not sure exactly what you're doing with the logarithms here, but if you're doing any kind of NLP algorithm that uses the log, it's going to be doing the wrong thing if you're not actually giving it the log. In this case, it'll probably end up overweighting things with only 1 observed count. If the problem is simply that you want to distinguish between when the original entry had a 1 and when it didn't exist, you might want to think about maintaining a list of entries yourself and throwing them in a COO sparse matrix when you want to do matrix operations.

|

Collectives™ on Stack Overflow

Efficient way of taking Logarithm function in a sparse matrix

2 Answers 2

3 Comments

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related