7

I have a big sparse matrix. I want to take log4 for all element in that sparse matrix.

I try to use numpy.log() but it doesn't work with matrices.

I can also take logarithm row by row. Then I crush old row with a new one.

# Assume A is a sparse matrix (Linked List Format) with float values as data
# It is only for one row

import numpy as np
c = np.log(A.getrow(0)) / numpy.log(4)
A[0, :] = c

This was not as quick as I'd expected. Is there a faster way to do this?

2 Answers 2

12

You can modify the data attribute directly:

>>> a = np.array([[5,0,0,0,0,0,0],[0,0,0,0,2,0,0]])
>>> coo = coo_matrix(a)
>>> coo.data
array([5, 2])
>>> coo.data = np.log(coo.data)
>>> coo.data
array([ 1.60943791,  0.69314718])
>>> coo.todense()
matrix([[ 1.60943791,  0.        ,  0.        ,  0.        ,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        ,  0.        ,  0.69314718,
          0.        ,  0.        ]])

Note that this doesn't work properly if the sparse format has repeated elements (which is valid in the COO format); it'll take the logs individually, and log(a) + log(b) != log(a + b). You probably want to convert to CSR or CSC first (which is fast) to avoid this problem.

You'll also have to add checks if the sparse matrix is in a different format, of course. And if you don't want to modify the matrix in-place, just construct a new sparse matrix as you did in your answer, but without adding 3 because that's completely unnecessary here.

Sign up to request clarification or add additional context in comments.

3 Comments

What are the differences between my solution and your solution? You propose only that 3 isn't necessary which has already proposed as a comment by you in my solution.
@Thorn I actually initially misread your solution (thought you were adding 3 to the entire matrix and so doing a whole lot of unnecessary logarthms). You're right that they're basically the same.
Grateful for this answer as the example makes it very clear. Even if it is essentially the same answer, it's good to have it here.
0

I think I solve it with very easy way. It is very strange that no one could answer immediately.

# Let A be a COO_matrix
import numpy as np
from scipy.sparse import coo_matrix
new_data = np.log(A.data+3)/np.log(4) #3 is not so important. It can be 1 too.
A = coo_matrix((new_data, (A.row, A.col)), shape=A.shape)

10 Comments

No-one suggested this solution because it is mathematically incorrect. log(x) could be very different from log(x+1)! (example: log(0.000001) = -6, log(0.0000001 + 1 = 0 and a bit.
Sorry for ill-posed question. I didn't mention that all data are positive and bigger than 1. This are the values of TF(term frequency) matrices. I think there will be no problem.
There's absolutely no reason to add 3 (or anything) here, since none of the entries in A.data will be 0. But if you do want to take the approach of adding a constant, use a smaller one! Adding say 1e-16 will have the same effect of never taking log(0) but with much less error introduced: using the appropriate identity, it's log(x + eps) = log(x) + log(1 + eps/a), where the error introduced is near 0 if eps/a is almost 0.
@Dougal thank you but I want to add 3 because I don't want to make these 1's zero after logarithm. It is a design concern of term matrices and it is related to text processing. I do not want to add very small value because, for me, 1s are more important than you expect. Besides, It doesn't change much.
@Thorn I'm not sure exactly what you're doing with the logarithms here, but if you're doing any kind of NLP algorithm that uses the log, it's going to be doing the wrong thing if you're not actually giving it the log. In this case, it'll probably end up overweighting things with only 1 observed count. If the problem is simply that you want to distinguish between when the original entry had a 1 and when it didn't exist, you might want to think about maintaining a list of entries yourself and throwing them in a COO sparse matrix when you want to do matrix operations.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.