Deleting rows conditionally in python with dates

Question

I would like to standardize my dataframe making it start and end at a precise date but i can't find the solution... I am dealing with a timeseries so it is crucial I have everything starting and ending on the same day.

I have tried several piece of code including code from stackvoerflow but nothing works.

Right now I just want rows that are between 01/01/2010 and 31/12/2017 this is the code I have so far:

df=pd.read_csv("AREX.csv", sep = ";")
df[~df['Date'].isin(pd.date_range(start='20100101', end='20171231'))]        
print(df)
df.drop(["Open","High","Low","Volume","Open interest"],axis = 1, inplace=True)
print(df)

But it does not affect the number of rows it only drops the column I ask it to.

Does anyone has any idea on how to fix this?

Thank you in advance for any piece of advice you can give me!

Have to assign back ... df = df[~df['Date'].isin(pd.date_range(start='20100101', end='20171231'))] — rafaelc
– rafaelc, Commented Oct 14, 2018 at 18:51
No I tried that earlier and it does not work either... :/ But thank you for the input :) — Grégoire Caye
– Grégoire Caye, Commented Oct 14, 2018 at 19:56
Look, there are several things that could be happening. First of all, is Date an actual datetime dtype column ? Second, you definitely have to assign back, otherwise your code is doing nothing. Third, instead of using pd.date_range, use (df.Date <= '2017-12-31) & (df.Date >= '2010-01-01)` — rafaelc
– rafaelc, Commented Oct 14, 2018 at 20:00
Thank you so much for your help ! So part of it works now, it starts at 2010 but does not stop as 2017 I'll need to figure out a solution. here is the chnage I have made df["Date"] = pd.to_datetime(df['Date']) df = df[~df['Date'].isin(df.Date <= '2017-12-31') & (df.Date >= '2010-01-01')] Thank you @RafaelC — Grégoire Caye
– Grégoire Caye, Commented Oct 14, 2018 at 20:33
Thank you for all the advice and solution ! I have put it my loop and it works perfectly now ! :) — Grégoire Caye
– Grégoire Caye, Commented Oct 14, 2018 at 20:49

Grégoire Caye · Accepted Answer · 2018-10-14 20:58:18Z

Ok so thanks to @RafaelC here is the answer to my problem.

def concatenate(indir="../Equity_Merton", outfile = "../Merged.csv"):
    os.chdir(indir)    
    fileList = glob.glob("*.csv")
    ticker = []
    main_df = pd.DataFrame()

    for filename in fileList:
        print(filename)
        df=pd.read_csv(filename, sep = ";")
        ticker.append(df)
        df["Date"] = pd.to_datetime(df['Date'])
        df = df[(df.Date <= '2017-12-31') & (df.Date >= '2010-01-01')]
        df.set_index("Date", inplace=True)     
        df.rename(columns = {"Close": filename[0:len(filename) - 4]}, inplace = True)
        df.drop(["Open","High","Low","Volume","Open interest"],axis = 1, inplace=True)

        if main_df.empty:
            main_df = df
        else:
            main_df = main_df.join(df, how='outer')

#        main_df = main_df.dropna(axis = 0, how="any")
        main_df.sort_index(axis=0, level=None, ascending=False, inplace=True, kind='quicksort', na_position='last')

    print(main_df.head())
    main_df.to_csv('Merton_Merged.csv')         
    shutil.move("Merton_Merged.csv", "../Merton_Merged.csv")

Thank you for your help!!

Collectives™ on Stack Overflow

Deleting rows conditionally in python with dates

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related