I'm trying to plot performance metrics of various assets in a back test.
I have imported the 'test_predictions.json' into a pandas data frame. It is a list of dictionaries and contains results from various asset (listed one after the other), Here is a sample is the data:
trading_pair return timestamp prediction
[u'Poloniex_ETH_BTC' 0.003013302628677 1450753200L -0.157053292753482]
[u'Poloniex_ETH_BTC' 0.006013302628677 1450753206L -0.187053292753482]
...
[u'Poloniex_FCT_BTC' 0.006013302628677 1450753100L 0.257053292753482]
Each backtest starts and ends at different times.
Here' is the data for the assets of interest
'''
#These are the assets I would like to analyse
Poloniex_DOGE_BTC 2015-10-21 02:00:00 1445392800
Poloniex_DOGE_BTC 2016-01-12 05:00:00 1452574800
Poloniex_XRP_BTC 2015-10-28 06:00:00 1446012000
Poloniex_XRP_BTC 2016-01-12 05:00:00 1452574800
Poloniex_XMR_BTC 2015-10-21 14:00:00 1445436000
Poloniex_XMR_BTC 2016-01-12 06:00:00 1452578400
Poloniex_VRC_BTC 2015-10-25 07:00:00 1445756400
Poloniex_VRC_BTC 2016-01-12 00:00:00 1452556800
'''
So i'm trying to make an new array that contains the data for these assets. Each asset must be sliced appropriately so they all start from the latest start time and end at earliest end time (other wise there will be incomplete data).
#each array should start and end:
#start 2015-10-28 06:00:00
#end 2016-01-12 00:00:00
So the question is:
How can I search for an asset ie Poloniex_DOGE_BTC then acquire the index for start and end times specified above ?
I will be plotting the data via numpy so maybe its better turn into a numpy array, df.valuesand the conduct the search? Then i could use np.hstack(df_index_asset1, def_index_asset2) so it's in the right form to plot. So the problem is: using either pandas or numpy how do i retrieve the data for the specified assets which fall into the master start and end times?
EDIT:
From Kartik's answer I tried to obtain just the data for asset name: 'Poloniex_DOGE_BTC' using the follow code:
import pandas as pd
import numpy as np
preds = 'test_predictions.json'
df = pd.read_json(preds)
asset = 'Poloniex_DOGE_BTC'
grouped = df.groupby(asset)
print grouped
EDIT2: I have changed the link to the data so it is test_predictions.json`
EDIT3: this worked a treat:
preds = 'test_predictions.json'
df = pd.read_json(preds)
asset = 'Poloniex_DOGE_BTC'
grouped = df.groupby('market_trading_pair')
print grouped.get_group(asset)`
#each array should start and end:
#start 2015-10-28 06:00:00 1446012000
#end 2016-01-12 00:00:00 1452556800
Now how can we truncate the data so that it starts and ends from the above timestamps ?
backtest.txtinsteadback_test.json.It is a list of dictionariesis wrong. It is not valid json, sopd.read_jsondoesnt work.test_predictions.jsonand I think it is a JSON, further more using my current code I was able to look at data in the df.