0

I have a .npz file which I want to load into RAM . The compressed file size is 30MB . I am doing the following operation to load the data into RAM.

import numpy as np
from scipy import sparse
from sys import getsizeof

a = sparse.load_npz('compressed/CRS.npz').todense()
getsizeof(a)
# 136
type(a)
# numpy.matrixlib.defmatrix.matrix
b = np.array(a)
getsizeof(b)
# 64000112
type(b)
# numpy.ndarray

Why numpy.matrix object occupy very low memory size compared to numpy.arrray ? Both a and b have same dimension and data.

2
  • Possible duplicate of Python memory usage of numpy arrays Commented Oct 19, 2018 at 18:22
  • If you'd used .toarray() you'd have gotten the full size. .todense adds a asmatrix layer on top of that, creating a view. That is an implementation detail. In general getsizeof is not a reliable measure. It sort of works with arrays, but is worthless with lists. Commented Oct 19, 2018 at 18:30

1 Answer 1

3

Your a matrix is a view of another array, so the underlying data is not counted towards its getsizeof. You can see this by checking that a.base is not None, or by seeing that the OWNDATA flag is False in a.flags.

Your b array is not a view, so the underlying data is counted towards its getsizeof.

numpy.matrix doesn't provide any memory savings.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.