How to store a sparse list in python?

Question

Now I have a dict object, where the key is a unique hashed id and the value is a length > 100 sparse list. I'd like to store this in plain text(e.g., csv/tsv/whatever that is not pickle.dump). Is there any good way to store this kind of sparse list? For example:

d = {"a": [0,0,0, ..., 1,0], "b": [0.5,0,0, ...,0.5,0], "c":...}

The length of each list is exactly the same. I was thinking whether it's worth storing this kind of sparse list as index-value pair. But I'm not sure whether there is any package do this.

Welcome to SO. Unfortunately this isn't a discussion forum. Please take the time to read How to Ask and the links it contains. — wwii
– wwii, Commented Oct 15, 2017 at 15:32
Hi @wwii, is there anything hard to understand for the question? — Mr.cysl
– Mr.cysl, Commented Oct 15, 2017 at 15:36
... is not a python object). In Python everything is an object. — wwii
– wwii, Commented Oct 15, 2017 at 15:36
Here I mean I do not want to use pickle.dump. Instead, I'd hope to find some methods that could store sparse list as readable file. Sorry for the confusion and it should be updated now. — Mr.cysl
– Mr.cysl, Commented Oct 15, 2017 at 15:37
Also plz let me know if you have any idea how to do that. Thanks! — Mr.cysl
– Mr.cysl, Commented Oct 15, 2017 at 15:39

Rob Watts · Accepted Answer · 2017-10-15 16:01:07Z

2

Rather than saving the 0s, you should transform the sparse list into a dictionary of the non-zero values. For example,

{'a':[0,0,0,1,0,0,0,2,0,0,0,3]}

could become

{'a':{3:1, 6:2, 9:3}}

You could transform the lists easily enough with a dictionary comprehension:

compressed_data = {
    hashed_id: {
        index: value for index, value in enumerate(values) if value != 0
    } for hashed_id, values in original_data.items()
}

Then you could just save that dictionary to a file. After you load the compressed list from the file:

decompressed_data = {}
for hashed_id, values in loaded_data.items():
    decompressed_values = [0] * DATA_LENGTH
    for index, value in values.items():
        decompressed_values[index] = value
    decompressed_data[hashed_id] = decompressed_values

answered Oct 15, 2017 at 16:01

Rob Watts

7,1563 gold badges42 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mr.cysl Over a year ago

This is exactly I'm looking for!

Marcus Müller Over a year ago

I'd argue it's not – it does fulfill your spec of being a sparse matrix plain text format, but it still doesn't make sense to care about storage space and plaintextness at the same time. They are conflicting targets, and I think your problem is ill-posed in the first place. There have been binary sparse matrix formats at least as long as there's been FORTRAN. So, that's more than 40 years now. They are still used, for a reason, unlike sparse plain text files, which make little sense – sparsity only matters for large (in computer term large) matrices, and for these having plaintext doesn't…

Stef Over a year ago

This should be the accepted answer! Alternatively to {index: value for index, value in enumerate(values) if value != 0} you can also write dict(filter(itemgetter(1),enumerate(values))) depending on your preferred style.

Mr.cysl · Accepted Answer · 2017-10-15 21:08:18Z

0

import numpy as np   
from scipy.sparse import csr_matrix,lil_matrix,save_npz,load_npz
a = {'a':[0,0,1,0],'b':[1,0,0,0],'c':[1,1,0,0]}
sparse1 = csr_matrix(np.array(a.values())) ## You can use lil_matirx as well
print sparse1
print sparse1.toarray()
save_npz('values.npz',sparse1)
np.save('keys.npy',np.array(a.keys()))
sparse3 = load_npz('values.npz')
print sparse3
print sparse3.toarray()
keys = np.load('keys.npy')
print keys

print dict(zip(keys,sparse3))

edited Oct 15, 2017 at 21:08

Mr.cysl

1,6849 gold badges25 silver badges44 bronze badges

answered Oct 15, 2017 at 16:08

Sunnysinh Solanki

5514 silver badges10 bronze badges

Collectives™ on Stack Overflow

How to store a sparse list in python?

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related