52

What would be the most efficient way to concatenate sparse matrices in Python using SciPy/Numpy?

Here I used the following:

>>> np.hstack((X, X2))
array([ <49998x70000 sparse matrix of type '<class 'numpy.float64'>'
        with 1135520 stored elements in Compressed Sparse Row format>,
        <49998x70000 sparse matrix of type '<class 'numpy.int64'>'
        with 1135520 stored elements in Compressed Sparse Row format>], 
       dtype=object)

I would like to use both predictors in a regression, but the current format is obviously not what I'm looking for. Would it be possible to get the following:

    <49998x1400000 sparse matrix of type '<class 'numpy.float64'>'
     with 2271040 stored elements in Compressed Sparse Row format>

It is too large to be converted to a deep format.

1 Answer 1

92

You can use the scipy.sparse.hstack to concatenate sparse matrices with the same number of rows (horizontal concatenation):

from scipy.sparse import hstack
hstack((X, X2))

Similarly, you can use scipy.sparse.vstack to concatenate sparse matrices with the same number of columns (vertical concatenation).

Using numpy.hstack or numpy.vstack will create an array with two sparse matrix objects.

Sign up to request clarification or add additional context in comments.

3 Comments

Seems hstack is quite slow, check this post out on a similar question link
@simeon interesting that Scipy's dev team hasn't adopted such efficient solution
For the horizontal concatenation hstack() and for the vertical concatenation vstack() can be used.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.