34

I have a couple of MongoDB documents wherein one my the fields is best represented as a matrix (numpy array). I would like to save this document to MongoDB, how do I do this?

{
'name' : 'subject1',
'image_name' : 'blah/foo.png',
'feature1' : np.array(...)
}
2
  • 5
    Have you tried serialization (via, say, pickle)? Commented Jun 16, 2011 at 5:53
  • 2
    While not a database replacement, you might also consider pytables (built on hdf5) to store you numpy arrays... pytables.org/moin Commented Jun 26, 2012 at 23:43

6 Answers 6

32

For a 1D numpy array, you can use lists:

# serialize 1D array x
record['feature1'] = x.tolist()

# deserialize 1D array x
x = np.fromiter( record['feature1'] )

For multidimensional data, I believe you'll need to use pickle and pymongo.binary.Binary:

# serialize 2D array y
record['feature2'] = pymongo.binary.Binary( pickle.dumps( y, protocol=2) ) )

# deserialize 2D array y
y = pickle.loads( record['feature2'] )
Sign up to request clarification or add additional context in comments.

7 Comments

This can be improved by using pickle.dumps(y, protocol=2), which results in a more compact and fast binary representation of the data.
Also, you could try cpickle, which can be up to 1000 times faster than pickle because it's implemented in c: docs.python.org/library/pickle.html#module-cPickle
In newer (at least 2.4) versions of pymongo binary.Binary has been moved to bson
I keep getting an error: TypeError: Required argument 'dtype' (pos 2) not found
The comment on using cPickle by @AlexGaudio is relevant for Python 2 only. Nowadays (Python 3), the accelerated libraries are used by default
|
9

The code pymongo.binary.Binary(...) didnt work for me, may be we need to use bson as @tcaswell suggested.

Anyway here is one solution for multi-dimensional numpy array

>>from bson.binary import Binary
>>import pickle
# convert numpy array to Binary, store record in mongodb
>>record['feature2'] = Binary(pickle.dumps(npArray, protocol=2), subtype=128 )
# get record from mongodb, convert Binary to numpy array
>> npArray = pickle.loads(record['feature2'])

Having said that, the credit goes to MongoWrapper used the code written by them.

1 Comment

this is currently working in 2022
5

We've built an open source library for storing numeric data (Pandas, numpy, etc.) in MongoDB:

https://github.com/manahl/arctic

Best of all it's really easy to use, pretty fast and supports data versioning, multiple data libraries and more.

1 Comment

This seems to be a way to store your pandas dataframe and then run queries on it (particualrly with date/time columns).
4

I know this is an old question but here is an elegant solution which works in new versions of pymongo:

import pickle
from bson.binary import Binary, USER_DEFINED_SUBTYPE
from bson.codec_options import TypeCodec, TypeRegistry, CodecOptions
import numpy as np

class NumpyCodec(TypeCodec):
    python_type = np.ndarray
    bson_type = Binary

    def transform_python(self, value):
        return Binary(pickle.loads(value), USER_DEFINED_SUBTYPE)

    def transform_bson(self, value):
        if value.subtype == USER_DEFINED_SUBTYPE:
            return pickle.dumps(value, protocol=2)
        return value

def get_codec_options():
    numpy_codec = NumpyCodec()
    type_registry = TypeRegistry([numpy_codec])
    codec_options = CodecOptions(type_registry=type_registry)
    return codec_options

def get_collection(name, db):
    codec_options = get_codec_options()
    return db.get_collection(name, codec_options=codec_options)

Then you can get you collection this way:

from pymongo import MongoClient
client = MongoClient()
db = client['my_db']
my_collection = get_collection('my_collection', db)

Afterwards, you just insert and find with Numpy arrays in your database transparently.

Comments

2

Have you tried Monary?

They have examples on the site

http://djcinnovations.com/index.php/archives/103

Comments

1

Have you try MongoWrapper, i think it simple :

Declare connection to mongodb server and collection to save your np.

import monogowrapper as mdb
db = mdb.MongoWrapper(dbName='test',
                      collectionName='test_collection', 
                      hostname="localhost", 
                      port="27017") 
my_dict = {"name": "Important experiment", 
            "data":np.random.random((100,100))}

The dictionary's just as you'd expect it to be:

print my_dict
{'data': array([[ 0.773217,  0.517796,  0.209353, ...,  0.042116,  0.845194,
         0.733732],
       [ 0.281073,  0.182046,  0.453265, ...,  0.873993,  0.361292,
         0.551493],
       [ 0.678787,  0.650591,  0.370826, ...,  0.494303,  0.39029 ,
         0.521739],
       ..., 
       [ 0.854548,  0.075026,  0.498936, ...,  0.043457,  0.282203,
         0.359131],
       [ 0.099201,  0.211464,  0.739155, ...,  0.796278,  0.645168,
         0.975352],
       [ 0.94907 ,  0.363454,  0.912208, ...,  0.480943,  0.810243,
         0.217947]]),
 'name': 'Important experiment'}

Save data to mongo :

db.save(my_dict)

To load back data :

my_loaded_dict = db.load({"name":"Important experiment"})

1 Comment

This looks great but doesn't work in 2023 :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.