I am trying to convert mongoDB documents into a flat pandas dataframe structure.
An example of my mongoDB collection structure:
data = collection.find_one({'ID':300})
print(data)
{'_id': "ObjectId('5cd932299f6b7d4c9b95af6c')",
'ID': 300,
'updated': 23424,
'data': [
{ 'meta': 8,
'data': [
{'value1': 1, 'value2': 2},
{'value1': 3, 'value2': 4}
]
},
{ 'meta': 9,
'data': [
{'value1': 5, 'value2': 6}
]
}
]
}
When i put this into a pandas dataframe, I get
df = pd.DataFrame(data)
print(df)
| _id | ID | updated | data
|
|--------------------------|-----|---------|------------------------ ---------------------------|
| 5cd936779f6b7d4c9b95af6d | 300 | 23424 | {'meta': 8, 'data': [{'value1': 1, 'value2': 2... |
| 5cd936779f6b7d4c9b95af6d | 300 | 23424 | {'meta': 9, 'data': [{'value1': 5, 'value2': 6}]} |
When I iterate through the dataframe with pd.concat I get
df.rename(columns={'data':'data1'}, inplace=True)
df2 = pd.concat([df, pd.DataFrame(list(df['data1']))], axis=1).drop('data1', 1)
df3 = pd.concat([df2, pd.DataFrame(list(df2['data']))], axis=1).drop('data', 1)
print(df3)
| _id | ID | updated | meta | 0 | 1 |
|--------------------------|-----|---------|------|----------------------------|----------------------------|
| 5cd936779f6b7d4c9b95af6d | 300 | 23424 | 8 | {'value1': 1, 'value2': 2} | {'value1': 3, 'value2': 4} |
| 5cd936779f6b7d4c9b95af6d | 300 | 23424 | 9 | {'value1': 5, 'value2': 6} | None |
The lowest level objects of the lowest level array has always the same names.
Therefore I want:
| ID | updated | meta | value1 | value2 |
|-----|---------|------|--------|--------|
| 300 | 23424 | 8 | 1 | 2 |
| 300 | 23424 | 8 | 3 | 4 |
| 300 | 23424 | 9 | 5 | 6 |
Am I on the wrong track?
What would be the most convenient way to solve this?