1
name age address
1 "Steve" 27 {"number": 4, "street": "Main Road", "city": "Oxford"}
2 "Adam" 32 {"number": 78, "street": "High St", "city": "Cambridge"}

However the subdocuments will just appear as JSON inside the subdocument cell

from pandas import DataFrame

df = DataFrame(list(db.collection_name.find({}))
print(df)

how can I get a below 2nd table like this using python?

what is the approach after this?

name age address.number address.street address.city
1 Steve 27 4 "Main Road" "Oxford"
2 Adam 32 78 "High St" "Cambridge"
0

2 Answers 2

4

You can use pd.DataFrame to expand the JSON/dict in column address into a dataframe of the JSON/dict contents. Then, join with the original dataframe using .join(), as follows:

Optional step: If your JSON/dict are actually strings, convert them to proper JSON/dict first. Otherwise, skip this step.

import ast
df['address'] = df['address'].map(ast.literal_eval)

Main codes:

import pandas as pd

df[['name', 'age']].join(pd.DataFrame(df['address'].tolist(), index=df.index).add_prefix('address.'))

Result:

    name  age  address.number address.street address.city
1  Steve   27               4      Main Road       Oxford
2   Adam   32              78        High St    Cambridge

Alternatively, if you have only a few columns to add from the JSON/dict, you can also add them one by one, using the string accessor str[], as follows

df['address.number'] = df['address'].str['number']
df['address.street'] = df['address'].str['street']
df['address.city'] = df['address'].str['city']

Setup

import pandas as pd

data = {'name': {1: 'Steve', 2: 'Adam'},
        'age': {1: 27, 2: 32},
        'address': {1: {"number": 4, "street": "Main Road", "city": "Oxford"},
                    2: {"number": 78, "street": "High St", "city": "Cambridge"}}}
df = pd.DataFrame(data)
Sign up to request clarification or add additional context in comments.

2 Comments

df['address.number'] = df['address'].str['number'] df['address.street'] = df['address'].str['street'] df['address.city'] = df['address'].str['city'] this part help a lot
@usmansharifshaik Right, that's a good option if you want only a limited number of fields. If you have a big JSON/dict and want all entries in it, the upper part would be more convenient to use.
1

Depending on use case, it may make more sense to setup an aggregation pipeline and $project the necessary nested documents up to the top level:

df = pd.DataFrame(db.collection_name.aggregate([{
    '$project': {
        '_id': 0,
        'name': '$name',
        'age': '$age',
        # Raise Sub-documents to top-level under new name
        'address_number': '$address.number',
        'address_street': '$address.street',
        'address_city': '$address.city'
    }
}]))

df:

    name  age  address_number address_street address_city
0  Steve   27               4      Main Road       Oxford
1   Adam   32              78        High St    Cambridge

Or if there are many too many fields to do manually we could also replaceRoot and mergeObjects:

df = pd.DataFrame(db.collection_name.aggregate([
    {'$replaceRoot': {'newRoot': {'$mergeObjects': ["$$ROOT", "$address"]}}},
    {'$project': {'_id': 0, 'address': 0}}
]))

df:

    name  age  number     street       city
0  Steve   27       4  Main Road     Oxford
1   Adam   32      78    High St  Cambridge

collection_name setup:

# Drop Collection if exists
db.collection_name.drop()
# Insert Sample Documents
db.collection_name.insert_many([{
    'name': 'Steve', 'age': 27,
    'address': {"number": 4, "street": "Main Road", "city": "Oxford"}
}, {
    'name': 'Adam', 'age': 32,
    'address': {"number": 78, "street": "High St", "city": "Cambridge"}
}])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.