3

I have a dataset of 6 milion rows, the columns are: symbol, timeStamp, open price and close price. I run the following loop, which takes very long, though being very simple (if open price is nan, take close price from the previous row):

for i in range(0,len(price2)):
    print(i)
    if np.isnan(price3.iloc[i,2]):
        price3.iloc[i,2]=price3.iloc[i-1,3]

How can I speed this loop up? As far as I know, I can change to apply(), but how can I include the if-condition to it?

1 Answer 1

3

Instead of the for loop, you can use pandas.Series.fillna with the shifted Series for the close price.

price3['open price'].fillna(price3['close price'].shift(1), inplace=True)

This is vectorized and so should be far faster than your for loop.

Note I am assuming that price2 and price3 have the same length and you may as well be iterating over price3 in your loop.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.