0

I want to insert pandas DataFrame into MongoDB. However, when I do so, The timestamp column (which is the index_coloumn of the Dataframe) does not get inserted into MongoDB.

Below is my pseudocode code which reproduces the problem:

from datetime import datetime

import pandas as pd
from pymongo import MongoClient

client = MongoClient('localhost', 27017)
db = client.ticks
collection = db.STOCK
collection_ohlc = db.STOCK_ohlc

# Read per second ticks data from Mongo into a dataframe
results = collection.find(
    {'timestamp': {'$gte': '2019-01-24T09:15:00', '$lte': '2019-01-24T09:19:59'}})
df = pd.DataFrame(list(results))

# Convert per second ticks data into 1 Minute OHLC Candle
df['timestamp'] = pd.to_datetime(df['timestamp'], errors='coerce')
df.set_index('timestamp', inplace=True)
ohlc_data = df['ltp'].resample('5min').ohlc()

# Print OHLC candle dataframe
print(ohlc_data)

# Write  the OHLC candle back to Mongo into a new collection STOCK_ohlc
collection_ohlc.insert_many(ohlc_data.to_dict('records'))

Here is the output of above print(ohlc_data) statement:

                       open   high    low   close
timestamp
2019-01-24 09:15:00  286.55  286.7  285.5  285.65

Now the code runs fine and ohlc values are inserted in MongoDB. However, the timestamp column is missing.

Below is MongoShell which lists above inserted record:

> db.STOCK_ohlc.find()
{ "_id" : ObjectId("5c6abc6f4994a1bc8c3c08fd"), "open" : 286.55, "high" : 286.7, "low" : 285.5, "close" : 285.65 }
>

As we see, the timestamp is missing from above inserted record. This is useless if timestamp is missing.

I tried various orient as mentioned in pandas.DataFrame.to_dict but none of them seem to be inserting into the MongoDB. The only orient that inserts data is records but then it omits timestamp.

Any pointers would be of great help.

UPDATE: Here is the output of print(ohlc_data.to_dict('records'))

[{'open': 286.55, 'high': 286.7, 'low': 285.5, 'close': 285.65}]
2
  • Can you print ohlc_data.to_dict('records')? It seems the problem is your row key is type of timestamp and Mongo needs string key. Somehow its omitted. Try this solution: https://stackoverflow.com/a/36909509/3710490 Commented Feb 18, 2019 at 14:55
  • @Valijon, Updated the post. Commented Feb 18, 2019 at 14:58

1 Answer 1

2

When you try to convert pd.DataFrame to dict, by default to_dict(.) skips the index and only converts the columns.

A solution would be that you set index as a column before use to_dict():

df.reset_index(level=0, inplace=True)
collection.insert_many(df.to_dict('records'))

Here is the output of df.to_dict('records'):

[{'timestamp': Timestamp('2019-01-24 09:15:00'), 'open': 286.55, 'high': 286.7, 'low': 285.5, 'close': 285.65}]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.