I have a MongoDB collection, which, when imported to Python via PyMongo, is a dictionnary in Python. I am looking to transform it into a Numpy Array.
For instance, if the JSON file looks like this :
{
"_id" : ObjectId("57065024c3d1132426c4dd53"),
"B" : {
"BA" : 14,
"BB" : 23,
"BC" : 32,
"BD" : 41
"A" : 50,
}
{
"_id" : ObjectId("57065024c3d1132426c4dd53"),
"A" : 1
"B" : {
"BA" : 1,
"BB" : 2,
"BC" : 3,
"BD" : 4
}
I'd like to get in return this 5*2 Numpy Array : np.array([[50,14,23,32,41], [1,1,2,3,4]]) In that case, the first column corresponds to "A", the second one to "BA", the third one to "BB", etc. Notice that keys are not always sorted in the same order.
My code, which does not work at all (and does not do what I want yet) looks like this :
from pymongo import MongoClient
uri = "mongodb://localhost/test"
client = MongoClient(uri)
db=client.recodb
collection=db.recos
list1=list(collection.find())
array2=np.vstack([[product[key] for key in product.keys()] for product in list1])
ObjectId("57065024c3d1132426c4dd53")isn't a valid JSON item: it should be serialised as some kind of string, eg"ObjectId(\"57065024c3d1132426c4dd53\")".