1

I was trying to convert tick data to OHLC data, and my code works as below:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import mpl_finance 
from datetime import *

import os

dateparse = lambda x: pd.datetime.strptime(x, '%Y/%m/%d %H:%M:%S')

file_dir = "D:/USDJPY 2017-2018/"  
#directory
for root, dirs, files in os.walk(file_dir):
    file_list = files
file_list.sort()

df_all =  pd.read_csv(file_dir + file_list[0], parse_dates=['RateDateTime'], index_col='RateDateTime',date_parser=dateparse) 
for file in file_list:
    if file != file_list[0]:
        df_all =  df_all.append(pd.read_csv(file_dir + file, parse_dates=['RateDateTime'], index_col='RateDateTime',date_parser=dateparse))

grouped = df_all.groupby('CurrencyPair')
ask =  grouped['RateAsk'].resample('1440Min').ohlc()
bid = grouped['RateBid'].resample('1440Min').ohlc()

a=pd.concat([ask, bid], axis=1, keys=['RateAsk', 'RateBid'])
a.to_csv('C:/Users/lenovo/Desktop/USDJPY 2017-2018 1DAY sorted.csv')
print('Conversion complete')

However, there are empty cells in my converted data looking as in the snippet: Sorted data snippet

As you can see, there were some empty cells on which days that there were no data available. I would like to remove rows such as Row 9 and Row 16, but I don't want Python to remove Row 3 as it is one of the header rows. I tried

a['Open'].replace('', np.nan, inplace=True)
a.dropna(subset=['Open'], inplace=True)

but Python returns me:

File "pandas_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'open'

How should I do it? And how can I quote column C and G to calculate spreads when they both have two headers stacked upon them? Please help! Many Thanks!

1 Answer 1

1

There is MultiIndex, so is necessary flatten columns names by:

a = pd.concat([ask, bid], axis=1, keys=['RateAsk', 'RateBid']) 
a.columns = a.columns.map('_'.join)

Then use boolean indexing with filtering all non empty and not NaN rows by column RateAsk_open:

a = a[(a['RateAsk_open'] != '') | (a['RateAsk_open'].notnull()]

But if want to drop the rows where all elements are missing:

a = a.dropna(how='all')
Sign up to request clarification or add additional context in comments.

4 Comments

@BernardLin - You get it by a=pd.concat([ask, bid], axis=1, keys=['RateAsk', 'RateBid'])
Thanks for the advice! There remains one question, I have not defined a df in my code, so where should I insert this line? Should it be df_all.columns=df_all.columns.map(''.join) or a.columns=a.columns.map(''.join) here?
@BernardLin - sorry, need a.columns = a.columns.map('_'.join), edited answer.
that was marvelous! I wish I had enough reputation to upvote your comments as well! Thanks so much for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.