1

I have a mongoDB collection with documents like this one

doc = {
  "_id": {
    "$oid": "516622c9ce21150200000d87"
  },
  "SubmissionDate": {
    "$date": "2013-04-11T02:41:13.162Z"
  },
  "isComplete": True,

  "Rounds": [
    {
      "Photo": [
        
      ],
      "A": {
        "Complexity": 55,
        "Colour": 85,
        "Deep": 51,
        "Effervescence": 44
      },
      "B": {
        "QualityPIDs": [
          
        ],
        "QualityScales": [
          
        ],
        "Complexity": 43,
        "Qualities": [
          
        ]
      },
      "C": {
        "QualityPIDs": [
          
        ],
        "QualityScales": [
          
        ],
        "Complexity": 60,
        "UHS": 46,
        "Colour": 33,
        "Qualities": [
          
        ]
      },
      "D": {
        "Complexity": 73,
        "Duration": 68,
        "Quality": 65
      }
    }
  ],
  "Item": {
    "_id": {
      "$oid": "51e6d678c06918db21156f92"
    },
    "Country": "Australia",
    "Name": "King",
    "PeopleId": {
      "$oid": "51dddb69a9d9350200000"
    },
    "Style": "Apple",
    "Type": "Flat",
    "UserSubmitted": False
  }
}

I need to convert this collection into pandas dataframe.

Solution suggested here How to import data from mongodb to pandas? does the main job. But I still have Rounds column with a dict of dictionaries inside.

I did a set of loops in order to access subdictionaries of Rounds

df = pd.json_normalize(doc)

A_data = pd.DataFrame(columns=df.Rounds[0][0]['A'].keys())
for i in range(len(df.Rounds)):
    A_data = A_data.append(pd.json_normalize(df.Rounds[0][0]['A']), ignore_index=True)

And finally I concat A_data to my main data frame.

Is there a faster way to do it? Right now loop takes to much time. Thank you!

1
  • 1
    Thank you! Your solution works just perfect, I didn't know about this parameter. Sorry for the delay with my feedback. Appreciate your time and help! Commented Aug 5, 2020 at 17:08

1 Answer 1

1
  • Each level of the dict can be specified using the mata parameter and use 'Rounds' for the record_path.
import pandas as pd

meta = [['_id', '$oid'],
        ['Item', 'Country'],
        ['Item', 'Name'],
        ['Item', 'Style'],
        ['Item', 'Type'],
        ['Item', 'UserSubmitted'],
        ['Item', '_id', '$oid'],
        ['Item', 'PeopleId', '$oid'],
        ['SubmissionDate', '$date'],
        'isComplete']

df = pd.json_normalize(doc, record_path='Rounds', meta=meta)

# display(df)
  Photo  A.Complexity  A.Colour  A.Deep  A.Effervescence B.QualityPIDs B.QualityScales  B.Complexity B.Qualities C.QualityPIDs C.QualityScales  C.Complexity  C.UHS  C.Colour C.Qualities  D.Complexity  D.Duration  D.Quality                  _id.$oid Item.Country Item.Name Item.Style Item.Type Item.UserSubmitted             Item._id.$oid     Item.PeopleId.$oid      SubmissionDate.$date isComplete
0    []            55        85      51               44            []              []            43          []            []              []            60     46        33          []            73          68         65  516622c9ce21150200000d87    Australia      King      Apple      Flat              False  51e6d678c06918db21156f92  51dddb69a9d9350200000  2013-04-11T02:41:13.162Z       True
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.