I was trying to convert tick data to OHLC data, and my code works as below:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import mpl_finance
from datetime import *
import os
dateparse = lambda x: pd.datetime.strptime(x, '%Y/%m/%d %H:%M:%S')
file_dir = "D:/USDJPY 2017-2018/"
#directory
for root, dirs, files in os.walk(file_dir):
file_list = files
file_list.sort()
df_all = pd.read_csv(file_dir + file_list[0], parse_dates=['RateDateTime'], index_col='RateDateTime',date_parser=dateparse)
for file in file_list:
if file != file_list[0]:
df_all = df_all.append(pd.read_csv(file_dir + file, parse_dates=['RateDateTime'], index_col='RateDateTime',date_parser=dateparse))
grouped = df_all.groupby('CurrencyPair')
ask = grouped['RateAsk'].resample('1440Min').ohlc()
bid = grouped['RateBid'].resample('1440Min').ohlc()
a=pd.concat([ask, bid], axis=1, keys=['RateAsk', 'RateBid'])
a.to_csv('C:/Users/lenovo/Desktop/USDJPY 2017-2018 1DAY sorted.csv')
print('Conversion complete')
However, there are empty cells in my converted data looking as in the snippet: Sorted data snippet
As you can see, there were some empty cells on which days that there were no data available. I would like to remove rows such as Row 9 and Row 16, but I don't want Python to remove Row 3 as it is one of the header rows. I tried
a['Open'].replace('', np.nan, inplace=True)
a.dropna(subset=['Open'], inplace=True)
but Python returns me:
File "pandas_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'open'
How should I do it? And how can I quote column C and G to calculate spreads when they both have two headers stacked upon them? Please help! Many Thanks!