I have a complex MongoDB database, consisting of documents nested upto 7 levels deep. I need to use PyMongo to extract the data, and then convert the extracted data to a .csv file.
-
What have you tried so far?Adam Smith– Adam Smith2018-07-02 04:08:58 +00:00Commented Jul 2, 2018 at 4:08
-
Can you use mongoexport?Atish– Atish2018-07-02 04:11:08 +00:00Commented Jul 2, 2018 at 4:11
-
So far I am able to extract the entire database and store it as a Python object. I am then able to convert this object to a .csv file. However the .csv file has thousands of columns. I need to know how I can extract the data in a clean manner.pack24– pack242018-07-02 04:11:19 +00:00Commented Jul 2, 2018 at 4:11
-
@Astro I can use mongoexport, but the .csv file has thousands of columns. I need to extract the data in an organized manner. I'm ok with extracting the data in multiple csv files and then combining all csv files into one. I'm not sure how to proceed with that though; I just know how to extract data as a whole.pack24– pack242018-07-02 04:13:47 +00:00Commented Jul 2, 2018 at 4:13
Add a comment
|
1 Answer
You can try using json_normalize. It is used to flatten the json.Reads data to a dataframe which can be stored in csv later.
For eg:
from pandas.io.json import json_normalize
# mongo_value is your mongo query
mongo_aggregate = db.events.aggregate(mongo_value)
mongo_df = json_normalize(list(mongo_aggregate))
# print(mongo_df)
mongo_columns = list(mongo_df.columns.values)
#just picks the column_name instead of properties.something.something.column_name
for w in range(len(mongo_columns)):
mongo_columns[w] = mongo_columns[w].split('.')[-1].lower()
mongo_df.columns = mongo_columns
For reference read this https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.json.json_normalize.html
1 Comment
pack24
Thank you for your help!