0

I have imported a csv dataset using python and did some clean ups. Download the dataset here

# importing pandas
import pandas as pd

# reading csv and assigning to 'data'
data = pd.read_csv('co-emissions-per-capita.csv')

# dropping all columns before 2016 (2016 - 2017 remains)
data.drop(data[data.Year < 2016].index, inplace=True)

# dropping rows with all null values in rows
data.dropna(how="all", inplace=True)

# dropping rows with all null values in columns
data.dropna(axis="columns", how="all", inplace=True)

# filling NA values
data["Entity"].fillna("No Country", inplace=True)
data["Code"].fillna("No Code", inplace=True)
data["Year"].fillna("No Year", inplace=True)
data["Per capita CO2 emissions (tonnes per capita)"].fillna(0, inplace=True)

# Sort by Year && Country
data.sort_values(["Year", "Entity"], inplace=True)

# renaming columns
data.rename(columns={"Entity": "Country",
                     "Per capita CO2 emissions (tonnes per capita)": "CO2 emissions (metric tons)"}, inplace=True)

My currecnt dataset has data for 2 years and 197 countries which is 394 rows AFTER THE CLEANUP I want to insert the data into mongodb in the following format.

{
    {
        "_id": ObjectId("5dfasdc2f7c4b0174c5d01bc"),
        "year": 2016,
        "countries":
        {
            "name": "Afghanistan",
            "code": "AFG",
            "CO2 emissions (metric tons)": 0.366302
        },
        {
            "name": "Albania",
            "code": "ALB",
            "CO2 emissions (metric tons)": 0.366302
        }
    },
    {
        "_id": ObjectId("5dfasdc2f7c4b0174c5d01bc"),
        "year": 2017,
        "countries":
        {
            "name": "Afghanistan",
            "code": "AFG",
            "CO2 emissions (metric tons)": 0.366302
        },
        {
            "name": "Albania",
            "code": "ALB",
            "CO2 emissions (metric tons)": 0.366302
        }
    }
}

I want one object each for an year. Inside that I want to nest all the countries and it related information. To be precise, I want my database to have 2(max) objects and 197 nested objects inside each main object. So each year will only be listed once inside the database whereas each country will appear twice in the database 1 each for 1 year is there a better structure to store these data? please specify the steps to store these data into mongodb and I'd really appreciate if you can suggest a good 'mongoose for NodeJs' like ODM driver for python.

1 Answer 1

2
  1. Use groupby function to split values from your dataframe into separate groups per year.
  2. Use to_dict function with orient parameter set to 'records' to convert results into JSON arrays.
  3. Use pymongo API to connect to DB and insert values.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.