Python Loop with auto creation of data frames

Question

I am trying to create a loop which will return for each ticker, 1. a different data frame (by the name of ticker) 2. with a conversion of the time column to "normal" day 3. and it (the new time) will be used as index for that data frame.

If I run it for each ticker it's working without problem. I appreciate your help!

import requests
import pandas as pd
desired_width = 320
pd.set_option('display.width', desired_width)

data = pd.DataFrame()
tickers = ['BTC', 'ETH', 'XRP']  # pools of tickers to get
for t in tickers:  # a loop to get data ticker by ticker
        url = 'https://min-api.cryptocompare.com/data/histoday' + \
              '?fsym=' + \
                t +\
              '&tsym=USD' + \
              '&limit=600000000000' + \
              '&aggregate=1' + \
              '&e=CCCAGG'
        response = requests.get(url)
        data[t] = response.json()['Data']
        #the following 2 lines I failed to execute
        #data[t]['time'] = pd.to_datetime(data[t]['time'], unit='s')
        #data[t].index = data[t]['time']
        print("downloading data for: " + t)
        print("data for:" + t, data.head(5))

My results is one data frame for all three tickers:

data for:XRP BTC
ETH XRP 0 {'time': 1342742400, 'close': 8.52, 'high': 8.... {'time': 1342742400, 'close': 0, 'high': 0, 'l... {'time': 1342742400, 'close': 0, 'high': 0, 'l... 1 {'time': 1342828800, 'close': 8.85, 'high': 9.... {'time': 1342828800, 'close': 0, 'high': 0, 'l... {'time': 1342828800, 'close': 0, 'high': 0, 'l... 2 {'time': 1342915200, 'close': 8.41, 'high': 8.... {'time': 1342915200, 'close': 0, 'high': 0, 'l... {'time': 1342915200, 'close': 0, 'high': 0, 'l... 3 {'time': 1343001600, 'close': 8.45, 'high': 9.... {'time': 1343001600, 'close': 0, 'high': 0, 'l... {'time': 1343001600, 'close': 0, 'high': 0, 'l... 4 {'time': 1343088000, 'close': 8.6, 'high': 8.8... {'time': 1343088000, 'close': 0, 'high': 0, 'l... {'time': 1343088000, 'close': 0, 'high': 0, 'l...

I am using python 3.6 with pycharm + anconda on Windows 10

jezrael · Accepted Answer · 2018-01-10 16:30:22Z

2

I think you can use json_normalize for dinctionary of DataFrames with concat for DataFrame with MultiIndex - first level are tickers:

from pandas.io.json import json_normalize

data = {}
tickers = ['BTC', 'ETH', 'XRP']  # pools of tickers to get
for t in tickers:  # a loop to get data ticker by ticker
        url = 'https://min-api.cryptocompare.com/data/histoday' + \
              '?fsym=' + \
                t +\
              '&tsym=USD' + \
              '&limit=600000000000' + \
              '&aggregate=1' + \
              '&e=CCCAGG'
        response = requests.get(url)
        data[t] = json_normalize(response.json()['Data'])

df = pd.concat(data)
print (df.head())

       close  high   low  open        time  volumefrom    volumeto
BTC 0   8.52  8.87  7.60  8.87  1342742400   154661.12  1267523.74
    1   8.85  9.70  7.96  8.52  1342828800   139906.90  1242153.88
    2   8.41  8.97  8.27  8.85  1342915200    30070.67   259113.81
    3   8.45  9.20  7.75  8.41  1343001600   146396.18  1238579.49
    4   8.60  8.85  8.34  8.45  1343088000    40946.86   353506.54

Then for select each level is posible use:

print (df.xs('BTC').head())

#print (df.loc['BTC'].head())

   close  high   low  open        time  volumefrom    volumeto
0   8.52  8.87  7.60  8.87  1342742400   154661.12  1267523.74
1   8.85  9.70  7.96  8.52  1342828800   139906.90  1242153.88
2   8.41  8.97  8.27  8.85  1342915200    30070.67   259113.81
3   8.45  9.20  7.75  8.41  1343001600   146396.18  1238579.49
4   8.60  8.85  8.34  8.45  1343088000    40946.86   353506.54

Another approach is not concat, only create dictionary:

data = {}
tickers = ['BTC', 'ETH', 'XRP']  # pools of tickers to get
for t in tickers:  # a loop to get data ticker by ticker
        url = 'https://min-api.cryptocompare.com/data/histoday' + \
              '?fsym=' + \
                t +\
              '&tsym=USD' + \
              '&limit=600000000000' + \
              '&aggregate=1' + \
              '&e=CCCAGG'
        response = requests.get(url)
        data[t] = json_normalize(response.json()['Data'])
        data[t] = data[t].set_index(pd.to_datetime(data[t]['time'], unit='s'))

print (data['BTC'].head())

            close  high   low  open        time  volumefrom    volumeto
time                                                                   
2012-07-20   8.52  8.87  7.60  8.87  1342742400   154661.12  1267523.74
2012-07-21   8.85  9.70  7.96  8.52  1342828800   139906.90  1242153.88
2012-07-22   8.41  8.97  8.27  8.85  1342915200    30070.67   259113.81
2012-07-23   8.45  9.20  7.75  8.41  1343001600   146396.18  1238579.49
2012-07-24   8.60  8.85  8.34  8.45  1343088000    40946.86   353506.54

EDIT: If want global variable not recommended solution:

data = {}
tickers = ['BTC', 'ETH', 'XRP']  # pools of tickers to get
for t in tickers:  # a loop to get data ticker by ticker
        url = 'https://min-api.cryptocompare.com/data/histoday' + \
              '?fsym=' + \
                t +\
              '&tsym=USD' + \
              '&limit=600000000000' + \
              '&aggregate=1' + \
              '&e=CCCAGG'
        response = requests.get(url)
        globals()['df_' + str(t)] = json_normalize(response.json()['Data'])
        globals()['df_' + str(t)] = globals()['df_' + str(t)].set_index(pd.to_datetime(globals()['df_' + str(t)]['time'], unit='s'))

print (df_BTC.head())

            close  high   low  open        time  volumefrom    volumeto
time                                                                   
2012-07-20   8.52  8.87  7.60  8.87  1342742400   154661.12  1267523.74
2012-07-21   8.85  9.70  7.96  8.52  1342828800   139906.90  1242153.88
2012-07-22   8.41  8.97  8.27  8.85  1342915200    30070.67   259113.81
2012-07-23   8.45  9.20  7.75  8.41  1343001600   146396.18  1238579.49
2012-07-24   8.60  8.85  8.34  8.45  1343088000    40946.86   353506.54

edited Jan 10, 2018 at 16:30

answered Jan 10, 2018 at 12:39

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Giladbi Over a year ago

thanks! it does solve problems 2 + 3 but not the first which is creating 3 or 99 different data frames each with "df_" + t so I will have df_BTC, df_ETH etc.

jezrael Over a year ago

@Giladbi - Is possible use dictionary of DataFrames ?

jezrael Over a year ago

But if really need it, then use globals()['df_' + str(t)] = json_normalize(response.json()['Data']) instead data[t] = json_normalize(response.json()['Data']) and then get print (df_BTC), but it is a bit unpythonic.

Parfait Over a year ago

@Giladbi ... it is not advised to store many similar structure objects like dfs in your global environment as it requires too much maintenance and resource pointers. Use one container like list or dictionary and you can reference each by unique keys as jezrael shows. The same advice applies for R and other languages!

Giladbi Over a year ago

@jezrael thanks, it works but i still didn't solve the time format and the time as index. it should be done before the creation of the different data frames. I tried this: response.json()['time'] = pd.to_datetime(data[t]['time'], unit='s') for time format change but i am messing it up.

|

Giladbi · Accepted Answer · 2018-01-11 08:15:20Z

I manged to defined a function to get the data and than use a loop to get it for all the tickers. this solves the problem.

import requests
import datetime
import pandas as pd
import matplotlib.pyplot as plt
desired_width = 320
pd.set_option('display.width', desired_width)

#function to download the Historical HOUR data
def hourly_price_historical(symbol, comparison_symbol, limit, aggregate, exchange=''):
    url = 'https://min-api.cryptocompare.com/data/histohour?fsym={}&tsym={}&limit={}&aggregate={}'\
            .format(symbol.upper(), comparison_symbol.upper(), limit, aggregate)
    if exchange:
        url += '&e={}'.format(exchange)
    page = requests.get(url)
    data = page.json()['Data']
    df = pd.DataFrame(data)
    df['timestamp'] = [datetime.datetime.fromtimestamp(d) for d in df.time]
    df = df.drop('time', 1)
    df.set_index('timestamp')
    return df


data = {}
tickers = ['BTC', 'ETH', 'XRP']                 # pools of tickers to get
for t in tickers:                                                           # a loop to get data ticker by ticker
    data[t] = hour_data = hourly_price_historical(t,'USD', 9999999,1)       # calling the function defined above
    print("Getting the data for: ", t)
    globals()['df_' + str(t)] = data

Collectives™ on Stack Overflow

Python Loop with auto creation of data frames

2 Answers 2

11 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

11 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related