Python Pandas remove empty cells in dataframe

Question

I was trying to convert tick data to OHLC data, and my code works as below:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import mpl_finance 
from datetime import *

import os

dateparse = lambda x: pd.datetime.strptime(x, '%Y/%m/%d %H:%M:%S')

file_dir = "D:/USDJPY 2017-2018/"  
#directory
for root, dirs, files in os.walk(file_dir):
    file_list = files
file_list.sort()

df_all =  pd.read_csv(file_dir + file_list[0], parse_dates=['RateDateTime'], index_col='RateDateTime',date_parser=dateparse) 
for file in file_list:
    if file != file_list[0]:
        df_all =  df_all.append(pd.read_csv(file_dir + file, parse_dates=['RateDateTime'], index_col='RateDateTime',date_parser=dateparse))

grouped = df_all.groupby('CurrencyPair')
ask =  grouped['RateAsk'].resample('1440Min').ohlc()
bid = grouped['RateBid'].resample('1440Min').ohlc()

a=pd.concat([ask, bid], axis=1, keys=['RateAsk', 'RateBid'])
a.to_csv('C:/Users/lenovo/Desktop/USDJPY 2017-2018 1DAY sorted.csv')
print('Conversion complete')

However, there are empty cells in my converted data looking as in the snippet: Sorted data snippet

As you can see, there were some empty cells on which days that there were no data available. I would like to remove rows such as Row 9 and Row 16, but I don't want Python to remove Row 3 as it is one of the header rows. I tried

a['Open'].replace('', np.nan, inplace=True)
a.dropna(subset=['Open'], inplace=True)

but Python returns me:

File "pandas_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'open'

How should I do it? And how can I quote column C and G to calculate spreads when they both have two headers stacked upon them? Please help! Many Thanks!

jezrael · Accepted Answer · 2018-08-30 07:47:39Z

1

There is MultiIndex, so is necessary flatten columns names by:

a = pd.concat([ask, bid], axis=1, keys=['RateAsk', 'RateBid']) 
a.columns = a.columns.map('_'.join)

Then use boolean indexing with filtering all non empty and not NaN rows by column RateAsk_open:

a = a[(a['RateAsk_open'] != '') | (a['RateAsk_open'].notnull()]

But if want to drop the rows where all elements are missing:

a = a.dropna(how='all')

edited Aug 30, 2018 at 7:47

answered Aug 30, 2018 at 7:27

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jezrael Over a year ago

@BernardLin - You get it by a=pd.concat([ask, bid], axis=1, keys=['RateAsk', 'RateBid'])

BernardLin Over a year ago

Thanks for the advice! There remains one question, I have not defined a df in my code, so where should I insert this line? Should it be df_all.columns=df_all.columns.map(''.join) or a.columns=a.columns.map(''.join) here?

jezrael Over a year ago

@BernardLin - sorry, need a.columns = a.columns.map('_'.join), edited answer.

BernardLin Over a year ago

that was marvelous! I wish I had enough reputation to upvote your comments as well! Thanks so much for your help!

Collectives™ on Stack Overflow

Python Pandas remove empty cells in dataframe

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related